AWS Big Data Blog
Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator
October 2024: This post was reviewed and updated for accuracy.
When building a streaming data solution, most customers want to test it with data that is similar to their production data. Creating this data and streaming it to your solution can often be the most tedious task in testing the solution.
Amazon Kinesis Data Streams and Amazon Data Firehose enable you to continuously capture and store terabytes of data per hour from hundreds of thousands of sources. Amazon Managed Service for Apache Flink gives you the ability to use standard SQL to analyze and aggregate this data in real-time. It’s easy to create an Amazon Kinesis data stream or Amazon Data Firehose delivery stream with just a few clicks in the AWS Management Console (or a few commands using the AWS CLI or Amazon Kinesis API). However, to generate a continuous stream of test data, you must write a custom process or script that runs continuously, using the AWS SDK or CLI to send test records to Amazon Kinesis. Although this task is necessary to adequately test your solution, it means more complexity and longer development and testing times.
Wouldn’t it be great if there were a user-friendly tool to generate test data and send it to Amazon Kinesis? Well, now there is—the Amazon Kinesis Data Generator (KDG).
KDG overview
The KDG simplifies the task of generating data and sending it to Amazon Kinesis. The tool provides a user-friendly UI that runs directly in your browser. With the KDG, you can do the following:
- Create templates that represent records for your specific use cases
- Populate the templates with fixed data or random data
- Save the templates for future use
- Continuously send thousands of records per second to your Amazon Kinesis data stream or Amazon Data Firehose delivery stream
The KDG is open source, and you can find the source code on the Amazon Kinesis Data Generator repo in GitHub. Because the tool is a collection of static HTML and JavaScript files that run directly in your browser, you can start using it immediately without downloading or cloning the project. It is enabled as a static site in GitHub, and we created a short URL to access it.
To get started immediately, check it out at http://amzn.to/datagen.
Using the KDG
Getting started with the KDG requires only three short steps:
- Create an Amazon Cognito user in your AWS account (first-time only).
- Use this user’s credentials to log in to the KDG.
- Create a record template for your data.
When you’ve completed these steps, you can then send data to Amazon Kinesis Data Streams or Amazon Data Firehose.
Note: The KDG Help page includes a link to an Amazon CloudFormation template that will perform steps #1 and #2 for you. Once this setup is complete, authenticated users of the KDG will – by default – have permission to publish data to each Kinesis Data Stream and Amazon Data Firehose in your AWS account. This broad access is likely not appropriate for production settings. If you wish to restrict users from publishing to only specific data or delivery streams, you may edit the resulting IAM roles. To find those roles, see the Resources tab in the resulting CloudFormation stack. For details on finding this information, see the CloudFormation documentation.
Create an Amazon Cognito user
The KDG is a great example of a mobile application that uses Amazon Cognito for a user repository and user authentication, and the AWS JavaScript SDK to communicate with AWS services directly from your browser. For information about how to build your own JavaScript application that uses Amazon Cognito, see Use Amazon Cognito in your website for simple AWS authentication on the AWS Mobile Blog.
Before you can start sending data to your Amazon Kinesis data stream, you must create an Amazon Cognito user in your account who can write to Amazon Kinesis Data Streams and Amazon Data Firehose. When you create the user, you create a username and password for that user. You use those credentials to sign in to the KDG. To simplify creating the Amazon Cognito user in your account, we created a Lambda function and a CloudFormation template. For more information about creating the Amazon Cognito user in your AWS account, see Configure Your AWS Account.
Note: It’s important that you use the URL provided by the output of the CloudFormation stack the first time that you access the KDG. This URL contains parameters needed by the KDG. The KDG stores the values of these parameters locally, so you can then access the tool using the short URL, http://amzn.to/datagen.
Log in to the KDG
After you create an Amazon Cognito user in your account, the next step is to log in to the KDG. To do this, provide the username and password that you created earlier.
On the main page, you can configure your data templates and send data to an Amazon Kinesis data stream or Amazon Data Firehose delivery stream.
The basic configuration is simple enough. All fields on the page are required:
- Region: Choose the AWS Region that contains the Amazon Kinesis data stream or Amazon Data Firehose delivery stream to receive your streaming data.
- Stream/firehose name: Choose the name of the data stream or delivery stream to receive your streaming data.
- Records per second: Enter the number of records to send to your data stream or delivery stream each second.
- Record template: Enter the raw data, or a template that represents your data structure, to be used for each record sent by the KDG. For information about creating templates for your data, see the “Creating Record Templates” section, later in this post.
When you set the Records per second value, consider that the KDG isn’t intended to be a data producer for load-testing your application. However, it can easily send several thousand records per second from a single tab in your browser, which is plenty of data for most applications. In testing, the KDG has produced 80,000 records per second to a single Amazon Kinesis data stream, but your mileage may vary. The maximum rate at which it produces records depends on your computer’s specs and the complexity of your record template.
Ensure that your data stream or delivery stream is scaled appropriately:
- 1,000 records/second or 1 MB/second to an Amazon Kinesis data stream
- 5,000 records/second or 5 MB/second to an Amazon Data Firehose delivery stream
Otherwise, Amazon Kinesis may reject records, and you won’t achieve your desired throughput. For more information about adding capacity to a stream by adding more shards, see Resharding a Stream. For information about increasing the capacity of a delivery stream, see Amazon Data Firehose Limits.
Create record templates
The Record Template field is a free-text field where you can enter any text that represents a single streaming data record. You can create a single line of static data, so that each record sent to Amazon Kinesis is identical. Or, you can format the text as a template.
In this case, the KDG substitutes portions of the template with fake or random data before sending the record. This lets you introduce randomness or variability in each record that is sent in your data stream. The KDG uses Faker.js, an open source library, to generate fake data. For more information, see the faker.js project page in GitHub. The easiest way to see how this works is to review an example.
To simulate records being sent from a weather sensor Internet of Things (IoT) device, you want each record to be formatted in JSON. The following is an example of what a final record must look like:
For this use case, you want to simulate sending data from one of 50 sensors, so the sensorID field can be an integer between 1 and 50. The temperature value can range between 10 and 150, so the currentTemperature field should contain a value in this range. Finally, the status value can be one of three possible values: OK, FAIL, and WARN. The KDG template format uses moustache syntax (double curly-braces) to enclose items that should be replaced before the record is sent to Amazon Kinesis. To model the record, the template looks like this:
Take a look at one more example, simulating a stream of records that represent rows from an Apache access log. A single Apache access log entry might look like this:
The following example shows how to create a template for the Apache access log:
For more information about creating your own templates, see the Record Template section of the KDG documentation.
The KDG saves the templates that you create in your local browser storage. As long as you use the same browser on the same computer, you can reuse up to five templates.
Summary
Testing your streaming data solution has never been easier. Get started today by visiting the KDG hosted UI or its Amazon Kinesis Data Generator page in GitHub. The project is licensed under the Apache 2.0 license, so feel free to clone and modify it for your own use as necessary. And of course, please submit any issues or pull requests via GitHub.
If you have any questions or suggestions, please add them below.
Related
Scale Your Amazon Kinesis Stream Capacity with UpdateShardCount
About the Authors
Allan MacInnis is a Solutions Architect at Amazon Web Services. He works with our customers to help them build streaming data solutions using Amazon Kinesis. In his spare time, he enjoys mountain biking and spending time with his family.
Jared Warren is a Senior Solution Architect at Amazon Web Services, working with our Enterprise customers. Outside of work, he plays board games (the nerdier the better) and smokes bar-b-que in his backyard.
Matthew Kwan is an Associate Solutions Architect at Amazon Web Services (AWS). He is passionate about working with customers to dive deep into solving their business problems pertaining to analytics. He is driven to grow his analytics knowledge through building projects and solutions with an interest in industry verticals including sports and games. Outside of work, he enjoys playing video games, learning the bass guitar, and partaking in the occasional pickup basketball or volleyball game!
Audit History
Last reviewed and updated in October 2024 by Matthew Kwan | Associate Solutions Architect