The Internet of Things on AWS – Official Blog

Using AWS IoT Analytics to Prepare Data for QuickSight Time-Series Visualizations

Introduction

Visualizing IoT data that can vary significantly over short periods of time (seconds) is important for several reasons: exploration and discovery of patterns, assessing trends and cyclicity, as well as observing potential correlations and anomalies. Insightful time-series visualizations can help identify anomalies, raise alerts based on these anomalies, and improve communication between various stakeholders, especially between data consumers and engineering.

In this blog post, I will show how to accomplish three common time-series analytics tasks:

  • Demonstrate how to process raw IoT data using native AWS IoT Analytics capabilities;
  • Provide examples of how to transform timestamp fields so that time-series data sets can be visualized easily;
  • Showcase several visualizations of IoT time-series data using Amazon QuickSight;

To accomplish these goals, we will use a real-life use case and leverage an open source data set.

It is assumed that the reader has hands-on experience with AWS IoT Core and  AWS IoT Analytics platform as well as the QuickSight business analytics service. Throughout this blog post we will use the AWS Command Line Interface (CLI) and for the initial set up of the CLI on your device please follow these instructions. Additionally, the reader should be able to set up IAM credentials and know how to run a Python program to publish messages up to the IoT Core.

IoT Data Set

IoT Analytics can ingest data generated by IoT devices and sensors in both direct and batch modes. When using direct ingestion mode, the IoT Core rules engine immediately routes the payload to an IoT Analytics channel. In batch ingest mode, data can be ingested from a JSON repository, such as an Amazon S3 bucket.

For the purposes of this blog post we will batch a data set from from an open source provided by the city of Melbourne, Australia to AWS IoT Core and then on to an AWS IoT Analytics channel.

The city of Melbourne deployed environmental sensors in 2014, measuring light levels, humidity and temperature in two locations: Fitzroy Gardens and Docklands Library. The data collected was used to better understand and communicate the impact of canopy cover for urban cooling. Data was collected from sensor readings, with temperature, light, humidity every 5 minutes at 8 locations (a trial period during 2014 and 2015).

We will tap into the raw JSON data collected from the sensors. Here is a sample of a single JSON payload from one of the sensors installed in Docklands Library.

{
 "boardid": "509",
 "boardtype": "1",
 "humidity_avg": "47.7",
 "humidity_max": "47.7",
 "humidity_min": "47.7",
 "latitude": "-37.8199043",
 "light_avg": "93.7",
 "light_max": "93.7",
 "light_min": "93.7",
 "location": "Docklands Library",
 "longitude": "144.9404851",
 "mac": "0013a20040b31583",
 "model": "ENV",
 "rowid": "509-20141216210500",
 "temp_avg": "13.9",
 "temp_max": "13.9",
 "temp_min": "13.9",
 "timestamp": "2014-12-16T21:05:00.000"
}

By inspecting the payload, we can see that the timestamp generated by this particular sensor has an ISO 8601 format (day and time) and we know it is in the Australian/Perth time-zone.

For our demo purposes, we will be using the first 1,000 sample JSON messages generated from the sensors at both locations (Fitzroy Gardens and Docklands Library). All sample JSON messages used during our blog can be found at the following data location (you can right click and then Save As to save the data set on your device).

High-Level AWS IoT Analytics Architecture

The JSON data will be sent to the AWS IoT Core on an MQTT topic and from there it will sent via the IoT rules engine to an AWS IoT channel. A pipeline with invoke a Lambda function that will transform the date-time into a different format and a data store will be created to store the data. Two data-sets will be created, one for each source location, and QuickSight visualizations will be created to represent time-series data.

  1. Publish raw JSON data to AWS IoT Core from the open source file;
  2. Add to each JSON message payload the following, given the existing Australian ISO 8601 timestamp (e.g. 2014-12-16 21:05:00.000):
    • UNIX epoch timestamp in UTC (1418677200);
    • USA Pacific time-zone ISO 8601 timestamp (12/15/2014 13:00:00.000);
  3. Store the JSON messages into a single AWS IoT Analytics datastore;
  4. Create two AWS IoT Analytics data sets by querying the datastore and filtering the data based on the two sensor locations (Fitzroy Gardens and Docklands Library);

Please note that for production workloads you may need to consider by-passing the AWS IoT Core and publish the JSON payload data directly into an AWS IoT Analytics channel using the BatchPutMessage API.

AWS IoT Data Ingestion, AWS IoT Analytics, and QuickSight Visualizations Steps

  1. Create a Lambda function to enrich the original payload and add two additional time formats as specified above (UNIX epoch and US PT);
  2. Create an IoT Analytics channel, a pipeline, and a data store;
  3. Create two IoT Analytics data sets;
  4. Using a Python script, ingest source data from the JSON file into IoT Core. This will trigger the rule to send the data to the IoT Analytics channel created at step 1;
  5. Import data sets and create time-series visualizations in QuickSight;

Step 1

Create a Lambda function (Python 2.7) called transform_pipeline with the code below.

import time
import sys
import pytz
import datetime as DT
from datetime import datetime
from pytz import timezone

def lambda_handler(event, context):
   for e in event:
       if 'timestamp' in e:
           # Converting timestamp string to datetime object
            d_aus_unaware = datetime.strptime(e['timestamp'],'%Y-%m-%dT%H:%M:%S.%f')

           # Adding Australia/Perth timezone to datetime object
            timezone_aus = pytz.timezone("Australia/Perth")
            d_aus_aware = timezone_aus.localize(d_aus_unaware)

           # Determining UTC epoch timestamp from input Australian timestamp
            d_utc = d_aus_aware.astimezone(timezone('UTC'))
            epoch = time.mktime(d_utc.timetuple())

           # Determining PST timestamp from input Australian timestamp
            d_uspst = d_aus_aware.astimezone(timezone('US/Pacific'))

           # Adding Epoch and PST timestamp to message payload
            e['timestamp_aus'] = d_aus_aware.strftime('%m/%d/%Y %T')
            e['timestamp_us'] = d_uspst.strftime('%m/%d/%Y %T')
            e['epoch'] = epoch

   return event

The function above (transform_pipeline) uses the pytz library which has to be imported as a package. For more information on how to do this, please visit the AWS Lambda Deployment Package in Python page.

We use the CLI to give IoT Analytics permission to call the above Lambda function. Please note that the statement-id is a unique statement identifier as documented here.

aws lambda add-permission --function-name transform_pipeline --statement-id stat_id_100 --principal iotanalytics.amazonaws.com --action lambda:InvokeFunction

Step 2

To create an IoT Analytics channel, a pipeline, and a data store, log on to the AWS console and follow the instructions that can be found here. Alternatively, you may elect to use a CloudFormation template to create IoT Analytics resources.

A channel is used to ingest data from IoT Core or an S3 bucket and feed a pipeline while keeping a copy of the raw messages for a selected period of time (or indefinitely).

For our use case, a single channel is created to store all incoming and unprocessed raw JSON messages and will be named mychannel_aus_weather. Using the IoT Analytics console, create the channel named mychannel_aus_weather. When prompted for an IoT Core topic, use iot/aus_weather and it will create a rule in IoT Core for you that passes messages into this channel. You can have the console create the IAM role needed there as well.

A pipeline is used to filter, enrich, and transform the incoming JSON payloads. The pipeline name is mypipeline_aus_weather. Create a pipeline where the data source is the channel mychannel_aus_weather. In Step 1 of this blog, you created a Lambda function named transform_pipeline. Add one Lambda activity to your pipeline and select this function from the list.

At the end of the pipeline creation flow, you can create your data store and then select it for the output of the pipeline. A data store is used to store data after it has been processed via pipelines. We will create a single data store for the purpose of our blog and will name it mydatastore_aus_weather.

Step 3

Let’s use the CLI to create the data sets from our data store. Alternatively, you can use the console and create the data set interactively.

Docklands Library Data Set

aws iotanalytics create-dataset --dataset-name="aus_weather_docklands_library" --cli-input-json file://dataset_docklands.json

Where the input is found in the following JSON file.

{
  "actions" :[{
    "actionName":"myaction1",
    "queryAction":{
      "sqlQuery":"select * from mydatastore_aus_weather where location = 'Docklands Library' order by timestamp_aus"
                  }
             }]
}

Fitzroy Gardens Data Set

aws iotanalytics create-dataset --dataset-name="aus_weather_fitzroy_gardens" --cli-input-json file://dataset_fitzroy.json

With the corresponding JSON file.

{
  "actions" :[{
    "actionName":"myaction2",
    "queryAction":{
      "sqlQuery":"select * from mydatastore_aus_weather where location = 'Fitzroy Gardens' order by timestamp_aus"
                  }
             }]
}

Please observe that timestamp_aus is an attribute that did not exist on the original data set. It was added by our Lambda function to each JSON payload. The order by statement was also added so that we can plot the data sequentially against this attribute in QuickSight.

Step 4

The previous steps have been used to set up the data ingestion, transformation, and structures for visualizations but we have no data yet. In this step, we will ingest the data from the open source data set to IoT Core.

We download from the open source data set the 1,000 data records programmatically.

curl -XPORT 'https://data.melbourne.vic.gov.au/resource/277b-wacc.json' > input_aus.json

Before we can pipe the data from the open source data set to the IoT Core we need to set up the CLI. Instructions for installing and configuring the CLI for Windows or OS can be found here.

During configuration, you will be asked to enter the AWS Access Key and the AWS Secret Access Key associated with your AWS user ID. To obtain the access keys, navigate to the AWS console and, in the search bar, search for IAM (Identity Access Management), then select users, click on your user name, and, under security credentials, select Create Access Key and make sure to copy and save the secret access key.

We write a Python function to ingest the data into IoT Core and name it upload_raw_aus.py.

import json
import boto3
import fileinput
import multiprocessing as mp
import os

processes = 4

# An array of boto3 IoT clients
IotBoto3Client = [boto3.client('iot-data') for i in range(processes)]

def publish_wrapper(lineID, line):
    # Select the appropriate boto3 client based on lineID
    client = IotBoto3Client[lineID % 4]

    line_read = line.strip()
    print "Publish: ", os.getpid(), lineID, line_read[:70], "..."
    payload = json.loads(line_read)

    # Publish JSON data to AWS IoT
    client.publish(
        topic='iot/aus_weather',
        qos=1,
        payload=json.dumps(payload))

if __name__ == '__main__':
    pool = mp.Pool(processes)
    jobs = []
    print "Begin Data Ingestion"
    for ID, line in enumerate(fileinput.input()):
       # Create job for each JSON object
       res = jobs.append(pool.apply_async(publish_wrapper, (ID, line)))

    for job in jobs:
       job.get()

    print "Data Ingested Successfully"

Before running the Python script above we need to install jq. If you have a Windows machine, please see the download instructions here.

sudo yum install jq

We can now use the CLI to invoke the function above and pass it the file downloaded programmatically with the curl command.

cat input_aus.json | jq -c '.[]' | python upload_raw_aus.py

The Python script uses four processes to ingest data in parallel. Four IoT boto3 clients are created and the job of publishing each raw JSON message is distributed among them in a round-robin fashion. Notice that the MQTT topic iot/aus_weather, is the same topic we used to create the topic rule, during step 2, for our channel subscription.

After the execution of the cat script above we have two data sets ready to be consumed by our QuickSight visualizations.

Step 5

AWS IoT Analytics Transformed Data Sets Used for Visualizations:

  • aus_weather_docklands_library: contains IoT data for Docklands Library
  • aus_weather_fitzroy_gardens: contains IoT data for Fitzroy Gardens

At this point, we open QuickSight and begin designing and creating time-series visualizations.

On QuickSight, click on New analysis, followed by New data set.

Next, click on AWS IoT Analytics.  In our QuickSight analysis, we have to add both data sets (aus_weather_docklands_library and aus_weather_fitzroy_gardens). For now, choose any one of them and select Create data source. Repeat the above steps for aus_weather_fitzroy_gardens. We can now visualize each data set separately or together.

In this blog, we will visualize them together. Click on one data set. For this blog, we will us the ‘aus_weather_docklands_library‘. Click on Create analysis.

The QuickSight visualization screen will appear. Let’s now add the other data set into our analysis (aus_weather_fitzroy_gardens). Click on the aus_weather_docklands_library data set. A pop block will appear in which if we click on Edit analysis data sets, we will have the option to add another data set. Add the aus_weather_fitzroy_gardens data set as shown in the diagrams below.

We can now see both data sets imported into our analysis.

Our analysis has all the data we need, and we can start to develop our visualizations. Let us first plot humidity_avg, light_avg and temp_avg v/s epoch for Docklands Library. Click on the aus_weather_docklands_library set. Select the line plot from the graph options in the lower left-hand corner. Above the graph drag the epoch field to the X axis and the humidity_avg, light_avg, temp_avg fields to the Value column.

For each value field, select the aggregate type as average (default is sum).

We now have a visualization that depicts the variation of humidity, temperature, and light over time. The blue window below the graph can be scrolled or expanded to analyze the visual at different time-intervals. We can also change the graph title by simply clicking and editing it.

Now, we can repeat the steps to visualize the data for Fitzroy Gardens as well. At the top left corner, click on the + button and select Add visual. A new blank visual is created below the existing visual graph. Select the aus_weather_fitzroy_gardens data set and we can now create the graph in the same manner as we created for aus_weather_docklands_library. The diagram below shows the final result.

The final result should look like below.

Both visualizations use the Epoch time format for the time horizontal) axis and we need to change that to use an ISO 8601 format. We have already added a timestamp_aus column that shows time in mm/dd/yyyy HH:MM:SS format. By default, QuickSight treats it as a date format. If we try to plot our data (humidity, light and temperature) versus this column in date format, we see the following visualization.

Unfortunately, the unit of measurement on the X axis is a day. Since we have sample data points at minute granularity, this is not an accurate representation of our IoT data. In order to get the precision we have at the data label, we need the date in a string format vs. the normal date format.

We accomplish this by taking the following steps.

  1. Changing the format of the timestamp_aus field from date to string.
  2. Ensuring the data is lexically sorted by the timestamp_aus column. We have taken care of this step when we created our data set query. Our query contains order by timestamp_aus to sort the data appropriately when the data set is created from the datastore (
    select * from mydatastore_aus_weather where location = 'Fitzroy Gardens' order by timestamp_aus

    ).

To accomplish step 1, click on the data set name (aus_weather_docklands_library) and click Edit analysis data sets. Next to aus_weather_docklands_library, click Edit. We can now edit any column as needed. Click on the column timestamp_aus and change its data format to string.

We will repeat this process for aus_weather_fitzroy_gardens data set. We can now plot the graph in the same way as we plotted with UNIX epoch timestamp. For X axis, we can use timestamp_aus instead of epoch. Our graphs will be the same as epoch timestamps but the X axis now shows timestamps in ISO 8601.

In QuickSight, we have a feature to create new data fields (columns) from existing ones. For example, we could change our plot above to represent the data for each day and month. If we observe the visuals above, we see that the sample points expand over 2 days (12/14/2014, 12/15/2014).

We can divide each plot into 2, based on the day by creating new calculated fields. Given that our timestamp_aus field has value in the string format ‘MM/DD/YYYY HH:MM:SS’, substring starting at character 12 and spanning over 5 characters will give us the hour and minute (HH:MM). Substring at character 1 and spanning over 10 characters will gives us the date (MM/DD/YYYY). We can extract these and create two new fields: HH:MM and MM/DD/YYYY.

Click on the data set name (aus_weather_docklands_library) and click Edit analysis data sets. Next to aus_weather_docklands_library, click Edit. On the left-hand corner, click on the Fields tab and then click New Field.

First, we will be creating a new field that takes out the hour and minute from the timestamp_aus field. For this, we will use the substring() provided by QuickSight. For more information on QuickSight functions, visit the QuickSight Functions Supported page. Click substring and type the name of the new field as HH:MM. Refer to the diagrams below. Repeat the same process to obtain the day and month into a new calculated field called MM/DD/YYYY.

Click on Save and Visualize at the top. We are now ready to create our plot. For simplicity, let us only plot humidity_avg. Select the line plot, drag HH:MM on the X axis, humidity_avg on the Value and MM/DD/YYYY on the Color column. The visual below is created and shows 2 line plots of humidity_avg, one for each day. Repeat the above steps for aus_weather_fitzroy_gardens data set.

Final Result for Docklands Library

Final Result for Fitzroy Gardens

QuickSight also helps us locate the sensors on a location map if we know the location coordinates (latitude, longitude). Our data sets includes these attributes. Click on the global icon on the bottom left hand corner and click on the latitude and longitude columns.

We now have a QuickSight analysis which consist of stories. We have shown how to 2 stories (showing timeseries data and geospatial location of sensors). Let’s rename our analysis. At the top our analysis name is given by default to be aus_weather_docklands_library. We can change that to something like ‘Melbourne Weather Analysis’. The analysis becomes accessible anytime on the new analysis page.

Visualizing data helps to improve communication between stakeholders and increase collaboration. QuickSight provides a useful snapshot tool, called dashboard, for sharing visuals with stakeholders. The dashboard tool serves as an important mechanism for capturing a visual and saving its state. It allows us to share the visual with other active users. Let’s create a dashboard for our analysis and share it with another QuickSight user. On the right-hand corner, click on share and then click on Create dashboard.

On the Create a dashboard page, type in the name we want to give to our dashboard (Melbourne IoT Weather Analysis) and then click Create dashboard.

On the next page, type in the username or email address of the user we want to share the dashboard with. Note that the user must have a QuickSight account registered already. The picture below shows us typing a fictitious user called Pat. If the username/email is valid, we can click Share. If sharing is not required, we can click Cancel. On the next page, we can allow users to create their own analysis by clicking Can create analyses next to the user. Click on Close. After that on the next page, Choose Confirm to confirm that we grant the users access to the data.

Users are sent an email with a link to the shared dashboard.

 

Conclusion

This blog provides a step-by-step guide on how to visualize time-series and geospatial data using QuickSight. We have also showed how to prepare data for visualizations using AWS IoT Analytics.

There are additional features, not discussed on this blog, that are provided by QuickSight to improve visualizations and interactions with data on visual elements. One example is the Filter tab (left-hand corner), which can be used to selectively display data and remove noise for example. Apart from line plot, QuickSight can also support many different types of graphs (bar graphs, pie charts, etc.) depending on the data that is being visualized. In general, the more the user is aware of the underlying data, the more useful and meaningful the analysis model and visualizations become.

In this blog, we’ve shown how to create a data flow using AWS IoT Analytics for our specific use case. However, it is critical to understand various other data flows that can be created on the platform for similar use cases and properly assess the implications of choosing one over the other. For more information on how to use other visualizations tools/services, please visit this blog.

If you have any questions or would like to share your own QuickSight visualizations, please visit our forum.