The Internet of Things on AWS – Official Blog
Importing historical equipment data into AWS IoT SiteWise
Introduction
AWS IoT SiteWise is a managed service that helps customers collect, store, organize and monitor data from their industrial equipment at scale. Customers often need to bring their historical equipment measurement data from existing systems such as data historians and time series databases into AWS IoT SiteWise for ensuring data continuity, training artificial intelligence (AI) & machine learning (ML) models that can predict equipment failures, and deriving actionable insights.
In this blog post, we will show how you can get started with the BulkImportJob API and import historical equipment data into AWS IoT SiteWise using a code sample.
You can use this imported data to gain insights through AWS IoT SiteWise Monitor and Amazon Managed Grafana, train ML models on Amazon Lookout for Equipment and Amazon SageMaker, and power analytical applications.
To begin a bulk import, customers need to upload a CSV file to Amazon Simple Storage Service (Amazon S3) containing their historical data in a predefined format. After uploading the CSV file, customers can initiate the asynchronous import to AWS IoT SiteWise using the CreateBulkImportJob operation, and monitor the progress using the DescribeBulkImportJob and ListBulkImportJob operations.
Prerequisites
To follow through this blog post, you will need an AWS account and an AWS IoT SiteWise supported region. If you are already using AWS IoT SiteWise, choose a different region for an isolated environment. You are also expected to have some familiarity with Python.
Setup the environment
- Create an AWS Cloud9 environment using
Amazon Linux 2
platform - Using the terminal in your Cloud9 environment, install Git and clone the sitewise-bulk-import-example repository from Github
sudo yum install git git clone https://github.com/aws-samples/aws-iot-sitewise-bulk-import-example.git cd aws-iot-sitewise-bulk-import-example pip3 install -r requirements.txt
Walkthrough
For the demonstration in this post, we will use an AWS Cloud9 instance to represent an on-premises developer workstation and simulate two months of historical data for a few production lines in an automobile manufacturing facility.
We will then prepare the data and import it into AWS IoT SiteWise at scale, leveraging several bulk import jobs. Finally, we will verify whether the data was imported successfully.
A bulk import job can import data into the two storage tiers offered by AWS IoT SiteWise, depending on how the storage is configured. Before we proceed, let us first define these two storage tiers.
Hot tier: Stores frequently accessed data with lower write-to-read latency. This makes the hot tier ideal for operational dashboards, alarm management systems, and any other applications that require fast access to the recent measurement values from equipment.
Cold tier: Stores less-frequently accessed data with higher read latency, making it ideal for applications that require access to historical data. For instance, it can be used in business intelligence (BI) dashboards, artificial intelligence (AI), and machine learning (ML) training. To store data in the cold tier, AWS IoT SiteWise utilizes an S3 bucket in the customer’s account.
Retention Period: Determines how long your data is stored in the hot tier before it is deleted.
Now that we learned about the storage tiers, let us understand how a bulk import job handles writes for different scenarios. Refer to the table below:
Value | Timestamp | Write Behavior |
New | New | A new data point is created |
New | Existing | Existing data point is updated with the new value for the provided timestamp |
Existing | Existing | The import job identifies duplicate data and discards it. No changes are made to existing data. |
In the next section, we will follow step-by-step instructions to import historical equipment data into AWS IoT SiteWise.
Steps to import historical data
Step 1: Create a sample asset hierarchy
For the purpose of this demonstration, we will create a sample asset hierarchy for a fictitious automobile manufacturer with operations across four different cities. In a real-world scenario, you may already have an existing asset hierarchy in AWS IoT SiteWise, in which case this step is optional.
Step 1.1: Review the configuration
- From terminal, navigate to the root of the Git repo.
- Review the configuration for asset models and assets.
cat config/assets_models.yml
- Review the schema for asset properties.
cat schema/sample_stamping_press_properties.json
Step 1.2: Create asset models and assets
- Run
python3 src/create_asset_hierarchy.py
to automatically create asset models, hierarchy definitions, assets, asset associations. - In the AWS Console, navigate to AWS IoT SiteWise, and verify the newly created Models and Assets.
- Verify that you see the asset hierarchy similar to the one below.
Step 2: Prepare historical data
Step 2.1: Simulate historical data
In this step, for demonstration purpose, we will simulate two months of historical data for four stamping presses across two production lines. In a real-world scenario, this data would typically come from source systems such as data historians and time series databases.
The CreateBulkImportJob API has the following key requirements:
- To identify an asset property, you will need to specify either an
ASSET_ID
+PROPERTY_ID
combination or theALIAS.
In this blog, we will be using the former. - The data needs to be in CSV format.
Follow the steps below to generate data according to these expectations. For more details about the schema, refer to Ingesting data using the CreateBulkImportJob API.
- Review the configuration for data simulation.
cat config/data_simulation.yml
- Run
python3 src/simulate_historical_data.py
to generate simulated historical data for the selected properties and time period. If the total rows exceedrows_per_job
as configured inbulk_import.yml
, multiple data files will be created to support parallel processing. In this sample, about 700,000+ data points are simulated for the four stamping presses (A-D) across two production lines (Sample_Line 1 and Sample_Line 2). Since we configuredrows_per_job
as 20,000, a total of 36 data files will be created. - Verify the generated data files under
data
directory. - The data schema will follow the
column_names
configured inbulk_import.yml
config file.
Step 2.2: Upload historical data to Amazon S3
As AWS IoT SiteWise requires the historical data to be available in Amazon S3, we will upload the simulated data to the selected S3 bucket.
- Update the data bucket under
bulk_import.yml
with any existing temporary S3 bucket that can be deleted later. - Run
python3 src/upload_to_s3.py
to upload the simulated historical data to the configured S3 bucket. - Navigate to Amazon S3 and verify the objects were uploaded successfully.
Step 3: Import historical data into AWS IoT SiteWise
Before you can import historical data, AWS IoT SiteWise requires that you enable Cold tier storage. For additional details, refer to Configuring storage settings.
If you have already activated cold tier storage, consider modifying the S3 bucket to a temporary one which can be later deleted while cleaning up the sample resources.
Note that by changing the S3 bucket, none of the data from existing cold tier S3 bucket is copied to the new bucket. When modifying S3 bucket location, ensure the IAM role configured under S3 access role has permissions to access the new S3 bucket.
Step 3.1: Configure storage settings
- Navigate to AWS IoT SiteWise, select Storage, then select Activate cold tier storage.
- Pick an S3 bucket location of your choice.
- Select Create a role from an AWS managed template.
- Check Activate retention period, enter
30 days
, and save.
Step 3.2: Provide permissions for AWS IoT SiteWise to read data from Amazon S3
- Navigate to AWS IAM, select Policies under Access management, and Create policy.
- Switch to JSON tab and replace the content with the following. Update <bucket-name> with the name of data S3 bucket configured in
bulk_import.yml
.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:*" ], "Resource": ["arn:aws:s3:::<bucket-name>"] } ] }
- Save the policy with Name as
SiteWiseBulkImportPolicy
. - Select Roles under Access management, and Create role.
- Select Custom trust policy and replace the content with the following.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "iotsitewise.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
- Click Next and select the
SiteWiseBulkImportPolicy
IAM policy created in the previous steps. - Click Next and create the role with Role name as
SiteWiseBulkImportRole
. - Select Roles under Access management, search for the newly created IAM role
SiteWiseBulkImportRole
, and click on its name. - Copy the ARN of the IAM role using the copy icon.
Step 3.3: Create AWS IoT SiteWise bulk import jobs
- Replace the
role_arn
field inconfig/bulk_import.yml
with the ARN ofSiteWiseBulkImportRole
IAM role copied in previous steps. - Update the
config/bulk_import.yml
file:- Replace the
role_arn
with the ARN ofSiteWiseBulkImportRole
IAM role. - Replace the
error_bucket
with any existing temporary S3 bucket that can be deleted later.
- Replace the
- Run
python3 src/create_bulk_import_job.py
to import historical data from the S3 bucket into AWS IoT SiteWise: - The script will create multiple jobs to simultaneously import all the data files created into AWS IoT SiteWise. In a real-world scenario, several terabytes of data can be quickly imported into AWS IoT SiteWise using concurrently running jobs.
- Check the status of jobs from the output:
- If you see the status of any job as
COMPLETED_WITH_FAILURES
orFAILED
, refer to Troubleshoot common issues section.
Step 4: Verify the imported data
Once the bulk import jobs are completed, we need to verify if the historical data is successfully imported into AWS IoT SiteWise. You can verify the data either by directly looking at the cold tier storage or by visually inspecting the charts available in AWS IoT SiteWise Monitor.
Step 4.1: Using the cold tier storage
In this step, we will check if new S3 objects have been created in the bucket that was configured for cold tier.
- Navigate to Amazon S3 and locate the S3 bucket configured under AWS IoT SiteWise → Storage → S3 bucket location (in Step 3) for cold tier storage.
- Verify the partitions and objects under the
raw/
prefix.
Step 4.2: Using AWS IoT SiteWise Monitor
In this step, we will visually inspect if the charts show data for the imported date range.
- Navigate to AWS IoT SiteWise and locate Monitor.
- Create a portal to access data stored in AWS IoT SiteWise.
- Provide
AnyCompany Motor
as the Portal name. - Choose
IAM
for User authentication. - Provide your email address for Support contact email, and click Next.
- Leave the default configuration for Additional features, and click Create.
- Under Invite administrators, select your IAM user or IAM Role, and click Next.
- Click on Assign Users.
- Provide
- Navigate to Portals and open the newly created portal.
- Navigate to Assets and select an asset, for example, AnyCompany_Motor → Sample_Arlington → Sample_Stamping → Sample_Line 1 → Sample_Stamping Press A.
- Use Custom range to match the date range for the data uploaded.
- Verify the data rendered in the time series line chart.
Troubleshoot common issues
In this section, we will cover the common issues encountered while importing data using bulk import jobs and highlight some possible reasons.
If a bulk import job is not successfully completed, it is best practice to refer to logs in the error S3 bucket configured in bulk_import.yml
and understand the root cause.
No data imported
- Incorrect schema:
dataType does not match dataType tied to the asset-property
The schema provided at Ingesting data using the CreateBulkImportJob API should be followed exactly. Using the console, verify the provided DATA_TYPE provided matches with the data type in the corresponding asset model property. - Incorrect ASSET_ID or PROPERTY_ID:
Entry is not modeled
Using the console, verify the corresponding asset and property exists. - Duplicate data:
A value for this timestamp already exists
AWS IoT SiteWise detects and automatically discards any duplicate. Using console, verify if the data already exists.
Missing only certain parts of data
- Missing recent data: BulkImportJob API imports the recent data (that falls within the hot tier retention period) into AWS IoT SiteWise hot tier and doesn’t transfer it immediately to Amazon S3 (cold tier). You may need to wait for the next hot to cold tier transfer cycle, which is currently set to 6 hours.
Clean Up
To avoid any recurring charges, remove the resources created in this blog. Follow the steps to delete these resources:
- Navigate to AWS Cloud9 and delete your environment.
- Run
python3 src/clean_up_asset_hierarchy.py
to delete the following resources, in order, from AWS IoT SiteWise:- Asset associations
- Assets
- Hierarchy definitions from asset models
- Asset models
- From AWS IoT SiteWise console, navigate to Monitor → Portals, select the previously created portal, and delete.
- Navigate to Amazon S3 and perform the following:
- Delete the
S3 bucket location
configured under the Storage section of AWS IoT SiteWise - Delete the data and error buckets configured in the
/config/bulk_import.yml
of Git repo
- Delete the
Conclusion
In this post, you have learned how to use the AWS IoT SiteWise BulkImportJob API to import historical equipment data into AWS IoT SiteWise using AWS Python SDK (Boto3). You can also use the AWS CLI or SDKs for other programming languages to perform the same operation. To learn more about all supported ingestion mechanisms for AWS IoT SiteWise, visit the documentation.
About the authors
Raju Gottumukkala is an IoT Specialist Solutions Architect at AWS, helping industrial manufacturers in their smart manufacturing journey. Raju has helped major enterprises across the energy, life sciences, and automotive industries improve operational efficiency and revenue growth by unlocking true potential of IoT data. Prior to AWS, he worked for Siemens and co-founded dDriven, an Industry 4.0 Data Platform company. |
Avik Ghosh is a Senior Product Manager on the AWS Industrial IoT team, focusing on the AWS IoT SiteWise service. With over 18 years of experience in technology innovation and product delivery, he specializes in Industrial IoT, MES, Historian, and large-scale Industry 4.0 solutions. Avik contributes to the conceptualization, research, definition, and validation of Amazon IoT service offerings. |