The Internet of Things on AWS – Official Blog

Connected utility solutions for water and gas metering with AWS IoT

Introduction

Water meters are present at almost every location that consumes water, such as residential houses or large-scale production plants. Avoiding water loss is increasingly important as water shortages are more frequent across all continents. Due to an aging infrastructure, 30% of water flowing through pipes is lost to leaks (AWS announces 6 new projects to help address water scarcity challenges). Connected water metering solutions can help address this challenge.

Traditional water and gas meters are not connected to the cloud or the Internet. They also tend to implement industry-standard protocols, like Modbus or Profinet, which were first published in 1979 and 2003 respectively. While these protocols were not designed with cloud connectivity in mind, there are solutions offered by AWS and AWS partners that can still help transfer utility data to the cloud.

Smart meters provide many advantages over traditional meters – including the opportunity to analyze consumption patterns for leaks or other inefficiencies that can lead to cost and resource savings. Having in-depth consumption reports helps companies to support their environmental sustainability goals and corporate social responsibility initiatives.

You can combine cloud-based services with connected meters to utilize predictive maintenance capabilities and enable automated analytics to identify emerging issues before they cause disruptions. This kind of automation helps streamline the analysis process and reduce the need for manual intervention.

This post presents a broadly applicable solution to use pre-trained machine learning (ML) models to detect anomalies, such as leaks in recorded data. To accomplish this, we use a real-world, water meter example to illustrate integrating existing water and gas metering infrastructure through AWS IoT Greengrass and into AWS IoT Core.

Solution Overview

Before diving into the actual solution, let’s review the system architecture and its components.

Figure 1: An overview of the solution architecture.

Figure 1 illustrates the AWS solution architecture. In this example, we use a standard electromagnetic water meter. This meter can be configured to transmit either analog signals or communicate with an IO-Link master. For simplicity, we use analog outputs. Measurements from the flow meter are processed by a single-board computer – in this case a Raspberry Pi Zero W because it is affordable and lightweight.

If you prefer, you can substitute another device for the Raspberry Pi that can also run AWS IoT Greengrass. Similarly, you can substitute another protocol to communicate with the meter. One option is Modbus because it has an AWS-provided IoT Greengrass component. For more information, see Modbus-RTU protocol adapter.

The incoming sensor data is processed on the edge device and then sent to AWS IoT Core using MQTT messages. The AWS IoT Rules Engine routes incoming messages to an AWS Lambda function. This Lambda function parses the message payload and stores individual measurements in Amazon Timestream. (Timestream, which is a time-series database, is ideal for this use case because it is well-integrated with Amazon Managed Grafana and Amazon SageMaker.) The Lambda function then calls several SageMaker endpoints that are used to compute anomaly scores for incoming data points.

Figure 2: Data flow to AWS IoT Core.

Figure 2 illustrates how measurements flow from the water meter into AWS IoT Core. For this project and its sensor, two wires are used to receive two separate measurements (temperature and flow). Notably, the transmitted signal is just a voltage with a known lower and upper bound.

The Raspberry Pi Zero has only digital GPIO headers and you must use an analog-to-digital converter (ADC) to make these signals usable. The sensor data component on the Raspberry Pi uses the ADC output to calculate the actual values through a linear interpolation based on the given voltage and known bounds. (Please know that the sensor data component was written specifically for this architecture and is not a managed AWS IoT Greengrass component.) Finally, the calculated values, along with additional metadata like the device name, are sent to AWS IoT Core.

This architecture is flexible enough to support a wide array of meter types, by adapting only the sensor data component. For use-cases that involve collecting data from a larger number of meters, some modifications might be necessary to support them. To learn more about the relevant architecture choices, see Best practices for ingesting data from devices using AWS IoT Core and/or Amazon Kinesis.

The following sections discusses the three main components within this solution.

Data Ingestion and Processing

In order to get your meter data, the edge device polls the sensor in configurable intervals. After this data is processed on the device, a message payload (Listing 1) is sent to AWS IoT Core. Specifically, the AWS IoT Greengrass component uses the built-in MQTT messaging IPC service to communicate the sensor data to the broker.

{ 
    "response": {  
        "flow": "1.781", 
        "temperature": "24.1", 
    }, 
    "status": "success", 
    "device_id": "water_meter_42", 
} 

Listing 1: Sample MQTT message payload

Once the message arrives at the broker, an AWS IoT rule triggers and relays the incoming data to a Lambda function. This function stores the data in Timestream and gets anomaly scores. Storing the data in a time-series database ensures that a historic view of measurements is available. This is helpful if you also want to perform analyses on historical data, train machine learning models, or just visualize previous measurements.

Data Visualization

Visualizing historical data can help data exploration and performing manual sanity checks, if desired. For this solution, we use Amazon Managed Grafana to provide an interactive visualization environment. Amazon Managed Grafana integrates with Timestream through a provided data source plugin. (For more information, see Connect to an Amazon Timestream data source.) The plug-in helps to set up a dashboard that displays all the collected metrics.

The following graphs are from the Amazon Managed Grafana dashboard. The graphs display measured water flow in liters per minute and measured temperature in degrees of Celsius over time.

Figure 3: Amazon Managed Grafana monitoring dashboard

The upper graph in Figure 3 displays flow measurements over a period of about eleven hours. The pictured water flow pattern is characteristic for a water pump that was turned on and off repeatedly. The lower graph displays water temperature variations from about 20 °C to 40 °C, over the same time frame as the other graph.

Advanced Use Cases

Another advantage of having a historical data set for each sensor is that you can use SageMaker to train a machine learning model. For the metering data use case, it can be useful to have a model that provides real-time anomaly detection. By employing such a system, operators can quickly be alerted to abnormalities or malfunctions, and investigate them before major damage is caused.

Figure 4: Two examples of anomalies in water flow monitoring

Figure 4 contains two examples of what a water flow anomaly could look like. The graph displays water flow measurements over a period of roughly 35 minutes and contains two irregularities. Both anomalies last roughly two minutes and are highlighted with red rectangles. They were caused through a temporary leak in a water pipe and can be identified thanks to the noticeable flow pattern changes.

SageMaker provides several built-in algorithms and pre-trained models you can use for automated anomaly detection. Using these tools, you can get started quickly because there is little to no coding required to begin running experiments. In addition, the built-in algorithms are already optimized for parallelization across multiple instances, should you require it.

Amazon’s Random Cut Forest (RCF) algorithm is one of the built-in algorithms that is tested with this architecture. RCF is an unsupervised algorithm that associates an anomaly score with each data point. Unsupervised algorithms train on unlabeled data. See What’s the difference between supervised and unsupervised machine learning to learn more. The computed anomaly score helps to detect anomalous behavior that diverge from well-structured or patterned data in arbitrary-dimensional input. In addition, the algorithm’s process scales with the number of features, instances, and data set size. As a rule of thumb, high scores beyond three standard deviations from the mean are considered anomalous. Since it is an unsupervised algorithm, there is no need to provide any labels for the training process, which makes it especially suitable for sensor data where no accurate labeling of anomalies is available.

Once the model is trained on the data set, it can compute anomaly scores for all of the meter’s data points, which can then be saved in a separate Timestream database for further reference. You should also define a threshold to classify when a calculated score is considered anomalous. For visualization purposes, Amazon Managed Grafana can be used to plot the classified scores (see Figure 5).

Figure 5: Amazon Managed Grafana widget showing RCF anomaly classification

Figure 5 displays a cutout of a Managed Grafana dashboard with a time series and state timeline widget visible. The time series represents water flow measurements and contains a one-minute section of anomalous flow. The state timeline widget displays the anomaly classifications of the RCF algorithm, where green indicates a normal state and red an anomalous one.

If the algorithm identifies an anomalous data point, there are a wide range of automated actions that can be performed. For example, it can alert users through an SMS message or email, using Amazon Simple Notification Service (Amazon SNS). Potential issues can be detected quickly and before major damage is caused because the anomaly scores calculation happens in near real-time.

Conclusion

In summary, this blog post discussed how existing metering data can be integrated into AWS to unlock additional value. This solution collects data from analog sensors, ingests it into AWS IoT Core using an AWS IoT Greengrass device, processes and stores the measurements in Amazon Timestream, and performs anomaly detection using SageMaker.

While this example focuses on water meters, the core components can be adapted to work with any type of metering device. If you want to implement a similar system, please explore the AWS services that we discussed and experiment with your meter monitoring solutions. If you want to develop a production-ready application, the RaspberryPi Zero should be replaced with a device better suited for production workloads. For suggestions and other options, see the AWS qualified device catalog.

For another discussion about leak detection, see Detect water leaks in near real time using AWS IoT. If you are interested in anomaly detection applied to agriculture, please see Streamlining agriculture operations with serverless anomaly detection using AWS IoT.

About the authors

YOUR NAME

Tim Voigt

Tim Voigt is a Solutions Architect at AWS in the PACE team, which stands for Prototyping and Cloud Engineering. He is based in Germany and works at AWS while pursuing his graduate studies in computer science. Tim is passionate about developing novel solutions to solve real-world problems and diving deep on the technical concepts that underlie them.

YOUR NAME

Christoph Schmitter

Christoph Schmitter is a Solutions Architect in Germany who works with Digital Native customers. Christoph specializes in Sustainability where he supports businesses as they transform to building sustainable products and solutions. Prior to AWS, Christoph gained extensive experience in software development, architecture and implementing cloud strategies. He is passionate about everything tech – from building scalable and resilient systems to connecting his kids’ robots to the cloud. Outside of work, he enjoys reading, spending time with his family, and fiddling with technology.