AWS for Industries

Solving Scalability Challenges in Industry 4.0 with a Cloud Provider-Agnostic Edge Solution

In today’s rapidly evolving industrial landscape, the manufacturing sector is leading the way in using digital technology to revolutionize processes and improve efficiency. However, as this technological revolution unfolds, organizations are encountering significant challenges in adopting and implementing Industry 4.0 solutions, particularly for globally decentralized organizations.

One of the challenges related to Industry 4.0 adoption is the need for scalable, maintainable, and replicable infrastructure to collect and connect data from a wide range of machines and devices. Traditional, monolithic tools can have difficulties with meeting these needs due to a lack of flexibility and scalability. It’s crucial to have a unified edge solution that respects customer requirements for a cloud provider–agnostic approach and that addresses the scalability challenges inherent in a globally decentralized organization.

A unified namespace in the manufacturing industry acts as the backbone of Industry 4.0 by providing a standardized, centralized data model that integrates information from diverse sources, such as sensors, machines, and enterprise systems. This unified data repository, implemented by an MQTT broker, provides a comprehensive view of manufacturing processes, facilitating near real-time monitoring and decision-making. As manufacturing processes become increasingly digitized and interconnected, the volume of data grows exponentially, leading to challenges in scalable data ingestion and management.

In this blog post, we’ll introduce a scalable, cloud provider-agnostic edge solution and its infrastructure designed to seamlessly integrate within the Industry 4.0 framework, addressing the critical need for flexibility and efficiency in industrial operations. This solution not only overcomes the limitations of traditional tools by offering scalability and flexibility but also adheres to a cloud provider–agnostic approach, promoting wide compatibility and ease of adoption across global manufacturing operations.

Solution Overview

The edge solution for Industry 4.0 integrates GitOps principles with Flux CD and Amazon EKS Anywhere to enhance scalability and manageability of Kubernetes environments across on-premises and cloud infrastructures. By using Git as the single source of truth and Flux CD for continuous deployment and reconciliation, the solution ensures consistent and automated management of infrastructure and applications. Its cloud provider–agnostic approach enables deployment flexibility, preventing vendor lock-in and simplifying maintenance across decentralized IT infrastructures. The versatile ingestion and computing solution processes MQTT data from a unified namespace and publishes it to various targets, utilizing a modular and multi-threaded architecture for efficient, scalable data handling and edge processing. This setup leverages Kubernetes for dynamic data processing and deployment, promoting agility and scalability in smart manufacturing solutions.

Amazon EKS Anywhere for Industry 4.0 use cases

Amazon EKS Anywhere, launched at AWS re:Invent 2020, enables you to create and operate Kubernetes clusters on your own infrastructure. Kubernetes offers a powerful platform that is compatible across various cloud providers by using container orchestration to offer scalability, high availability, and efficient resource management.

Using Amazon EKS Anywhere, you can deploy, scale, and operate Kubernetes clusters consistently both in the cloud and on-premises, streamlining container orchestration, application deployment, and management tasks. It offers a unified control plane, security features, and seamless integration into other AWS services, facilitating the flexible deployment of containerized applications in a hybrid cloud infrastructure. Kubernetes clusters combined with the robust management capabilities of Amazon EKS Anywhere address the scalability challenges of globally decentralized organizations by providing a unified, flexible, and scalable edge infrastructure solution that can evolve with the Industry 4.0 platform.

Amazon EKS Anywhere offers a robust platform for smart manufacturing within the Industry 4.0 framework, providing essential benefits that are tailored to enhancing operational efficiency, flexibility, and scalability. By enabling decentralized processing at the edge, the service facilitates near real-time data analysis and decision-making directly at the production source, which can reduce latency and increase the responsiveness of manufacturing systems. This local processing capability promotes the flexible scaling of operations, efficiently adapting to demand fluctuations with minimal disruption.

GitOps and integrating Flux CD with Amazon EKS Anywhere for enhanced scalability

At its core, GitOps is about using Git, a widely used version control system, as the single source of truth for declarative infrastructure and applications. In GitOps, all changes to the infrastructure and applications are made through Git. This method not only brings a version control tool into the deployment process but also verifies that the state of your systems aligns precisely with the code in your repository. In this solution, another key tool is Flux CD, which automates the deployment of applications by continually aligning the cluster’s state with the configurations specified in the Git repository. This continual reconciliation between the desired state in Git and the current state of the Kubernetes cluster is a fundamental aspect of Flux CD’s operation. It brings a level of automation and precision that is vital for large-scale, dynamic environments. Flux CD supports multi-tenancy and can manage multiple clusters, making it an choice for complex deployments that are often seen in manufacturing data pipelines.


Figure 1. GitOps using Flux CD on Amazon EKS Anywhere

Integrating Flux CD into Amazon EKS Anywhere significantly enhances scalability in Kubernetes environments. When used alongside Amazon EKS Anywhere, Flux CD’s seamlessly automated deployment extends to on-premises clusters, creating a unified and scalable infrastructure management approach. The key to scalability lies in Flux CD’s ability to automatically handle cluster configurations, application deployments, and resource updates. In environments like Amazon EKS Anywhere, designed to run Kubernetes clusters on-premises, managing a consistent state across varied environments can be challenging. Flux CD simplifies this task by verifying that all clusters, regardless of location, are consistently updated and maintained in line with the configurations stored in Git. This automated, version-controlled approach reduces manual oversight and the potential for human error, which is especially important during scale-up operations.

An edge solution for Industry 4.0 implemented with a cloud provider–agnostic strategy

Integrating cloud provider–agnostic principles with an edge solution in smart manufacturing platforms requires a strategic approach. It involves selecting technologies and platforms that not only meet current needs but also are adaptable to future changes and innovations. This means prioritizing open standards, modular architectures, and a strong focus on interoperability and data portability. By designing with a cloud provider–agnostic approach, manufacturing organizations can avoid provider lock-in for their operations, enhancing flexibility and risk mitigation so that the organization can adapt to new technologies and market changes without incurring excessive migration costs.

A cloud provider–agnostic edge solution allows applications and services run on or with any cloud provider or on-premises environment with minimal to no modifications. This uniformity is vital for organizations with a decentralized IT infrastructure because it facilitates the deployment of standardized solutions across all divisions, regardless of the underlying cloud services or local infrastructure. This standardization simplifies management, reduces the learning curve for IT staff, and streamlines the integration of new technologies or services.

A cloud provider–agnostic approach greatly simplifies the maintenance of edge solutions across a wide network of global divisions. Since the underlying infrastructure can vary widely across such an expansive organization, a cloud provider–agnostic approach minimizes the need for specialized management tools for different environments. Organizations can use a consistent set of tools and processes for deployment, monitoring, and management, significantly reducing operational complexity and maintenance overhead. This unified management approach not only reduces costs but also ensures that all divisions adhere to the same standards of reliability and performance.

A custom-built, scalable ingestion and computing solution

The edge ingestion and computing solution depicted in the following figure 2 is a versatile application designed to ingest MQTT data that is generated from a unified namespace (an MQTT broker), process ingested data, and publish it to various targets in the cloud or on-premises. It should be noted that in addition to those supported publishing targets, you can implement additional publishing targets regardless of cloud providers or on-premises environments without modifying the existing codebase because it’s designed with cloud provider–agnostic principles in mind. Except for the publisher implementations for AWS cloud resources, all the edge solution implementations are based on open-source technologies that are interoperable between cloud providers.

The solution utilizes an adaptable data pipeline, which is composed of dispatcher, processor, and publisher components that are adaptable based on an external JSON configuration. This composition provides flexibility in constructing data pipelines during runtime. It operates with a multi-threaded architecture, enabling the simultaneous processing of multiple data sources and the parallel publication of processed data to multiple targets. This makes it a powerful and efficient solution for edge data ingestion and processing at scale.

The application’s configurable flexibility enables Kubernetes handle diverse data processing and deployment requirements at the edge. By using Kubernetes, you can efficiently manage the edge solution, ensuring that data ingestion, processing, and deployment are tailored to the specific needs of each deployment target. This contributes to the scalability and agility of the edge solution.


Figure 2. Edge ingestion and computing solution

Following is an example configuration file within the ingestion application.

   {
            "mqtt-broker-credential": {
                "secret-name": "mqtt-credential",
                "secret-region": "us-east-1"
            },
            "targets": {
                "s3-target": {
                    "type": "s3",
                    "event-type": "single",
                    "s3-bucket": "sample-bucket",
                    "region": "us-east-1",
                    "key-prefix": "sample-key",
                    "object-prefix": "sample-obj",
                    "format": "parquet"
                },
                "mqtt-target": {
                    "type": "mqtt",
                    "event-type": "single",
                    "mqtt-credential": {
                        "secret-name": "mqtt-credential",
                        "secret-region": "us-east-1"
                    },
                    "mqtt-broker-host": "127.0.0.1",
                    "processors": [
                        "edge-inference"
                    ]
                }
            },
            "processors": {
                "edge-inference": {
                    "type": "remote",
                    "base-url": "http://edge-processor.svc.cluster.local:8000/api/v1",
                    "processor-id": "inference-model-1"
                }
            },
            "subscriptions": {
                "com1/plant1/area1/+/Message1": {
                    "qos": 2,
                    "target": "s3-target"
                },
                "com1/plant1/area1/+/Message2": {
                    "qos": 2,
                    "target": "s3-target"
                },
               "com1/plant1/area2/+/Message1": {
                   "qos": 2,
                   "target": "mqtt-target"
               }
            }
        }

As shown in the previous code block, the edge solution’s behavior is defined by an external JSON configuration that is divided into four main sections: MQTT broker credentials, publishing targets, processor definitions, and MQTT subscription topics.

1. MQTT broker credentials

For MQTT authentication, username and password credentials are securely stored in AWS Secrets Manager—which helps you manage, retrieve, and rotate database credentials, API keys, and other secrets. These credentials are referenced in the configuration. The credentials configurations include a key-value pair, that represents the username and password, and specifying secret’s name and its AWS Region.

2. Publishing targets

The publishing target section specifies parameters for Amazon Simple Storage Service (Amazon S3)—an object storage service. It includes resource names, the AWS Region, and additional settings, such as file format and object naming conventions. For example, a target setup for Amazon S3 includes a specific bucket name, an AWS Region, a key prefix, and a file format. The solution supports other targets like MQTT, other AWS solutions and Apache Kafka. Each target requires its own parameters to properly function as a publishing target.

There are two publishing targets defined: an Amazon S3 bucket and an MQTT broker. Any messages published on com1/plant1/area1/+/Message1 and com1/plant1/area1/+/Message2 topics will be ingested into the Amazon S3 bucket as a Parquet object. Messages published on com1/plant1/area2/+/Message1 will trigger an edge inference model that is served by a RESTful API. The messages will serve as an input to the model. The output of the model will be transmitted to the MQTT broker configured in the JSON.

3. Processor definition

The processor definition section specifies the endpoint of the edge inference service that runs as a RESTful API and the unique identifier of the inference model that should run when called.

4. Subscription topics

The subscription topics section defines the MQTT topics that the application will subscribe to, linking them with the defined publishing targets. This setup offers flexibility in handling MQTT payloads, with options for single and multi-level wildcard topics.

Following is an example code illustrating a Kubernetes manifest that is designed for deploying the proposed edge solution.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingestion
  namespace: ingestion
  labels:
    name: ingestion
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ingestion
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ingestion
    spec:
      serviceAccountName: edge-computing-ingestion-sa
      imagePullSecrets:
      - name: regcred
      containers:
      - image: <AWS-Account-Id>.dkr.ecr.<AWS-Region>.amazonaws.com/edge-demo:latest
        name: ingestion
        env:
        - name: EDGE_BROKER_HOST
          value: <MQTT broker host address>
        - name: EDGE_MQTT_PORT
          value: 1883
        - name: EDGE_SUBSCRIPTION_CONFIG
          value: |
            {
                "mqtt-broker-credential": {
                    "secret-name": "mqtt-credential",
                    "secret-region": "us-east-1"
                },
                "targets": {
                    "s3-target": {
                        ...
                    },
                    "mqtt-target": {
                        ...
                    }
                },
                "processors": {
                    "edge-inference": {
                        ...
                    }
                },
                "subscriptions": {
                    "com1/plant1/area1/+/Message1": {
                        "qos": 2,
                        "target": "s3-target"
                    },
                    "com1/plant1/area1/+/Message2": {
                        "qos": 2,
                        "target": "s3-target"
                    },
                    "com1/plant1/area2/+/Message1": {
                        "qos": 2,
                        "target": "mqtt-target"
                    }
                }
            }
        imagePullPolicy: IfNotPresent
        command:
        - /bin/sh
        - -c
        - . ./venv/bin/activate && python3 "app.py"
        resources: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always

In this setup, the JSON configuration outlined previously is embedded directly within the manifest. Flux CD tracks any modifications made to the configuration as well as the manifest itself. Upon detecting changes, Flux CD automatically applies these updates to the Kubernetes cluster. This step ensures that the application state matches with the configuration that is detailed in the manifest, which is maintained in the Git repository.

Deployment options for ingestion at scale

The deployment strategy for the custom-built edge application is largely dictated by the configuration you choose. One viable approach is to deploy multiple pods, with each pod configured to handle a distinct portion of the MQTT topic space. This method provides specialized control over each data stream, facilitating precise scaling that is based on the individual needs of each topic. While this approach offers detailed management of each pod, it introduces a level of complexity in monitoring and managing a variety of configurations. This approach is depicted in the following figure 3.


Figure 3. The multi-pod deployment strategy

You also have the option of deploying a set of replicas of the edge application, all configured to subscribe uniformly to the same set of MQTT topics. This approach uses the shared subscriptions (MQTT v5) to distribute the data load evenly across all replicas, simplifying the overall management by maintaining a single configuration. This strategy, however, necessitates that each replica is capable of processing the entire spectrum of MQTT topics, which might not be ideal for certain processing requirements. This strategy is depicted in following figure 4.

Figure 4: The replica deployment strategy

The integration of GitOps practices, particularly with Flux CD, brings a significant enhancement to these deployment strategies. Flux CD specializes in the automated synchronization of configuration changes from a Git repository to Kubernetes clusters. This automation is crucial, whether the option is multiple distinct pod configurations or a uniform replica setup.

The GitOps framework with Flux CD ensures that any updates or modifications to the application’s configuration are efficiently and seamlessly rolled out. This not only makes the deployment process more streamlined but also injects a level of flexibility and adaptability into the system. The ability to swiftly modify configurations in response to changing requirements or operational demands is a key benefit here, underlining the robustness and efficiency of our data ingestion strategy in dynamic environments.

Conclusion

This blog described a custom-built edge ingestion and computing solution for on-premises or hybrid use cases that is designed with cloud provider–agnostic principles. We’ve delved into its infrastructure solution to achieve scalable deployment and management, highlighting the integral roles of Kubernetes, Flux CD, and Amazon EKS Anywhere.

These powerful tools collectively enhance the solution’s scalability and adaptability in unified namespace ingestion:

  • Kubernetes provides the foundation for efficient management and customization of data ingestion processes.
  • Flux CD brings in the benefits of GitOps for automated, error-free deployments.
  • Amazon EKS Anywhere extends these capabilities, providing consistent and effective management across both cloud and on-premises environments.

Together, these tools create a robust system that is tailored to data ingestion, edge computing, and deployment infrastructure and is optimized for Industry 4.0 smart manufacturing solutions.

Junsu Shin

Junsu Shin

Junsu Shin is a Senior IoT Architect at Amazon Web Services. He specializes in IoT, IIoT, Edge computing and big data platforms. Junsu works with automotive and manufacturing customers offering guidance and technical assistance to build advanced IoT and Edge solutions. His work focuses on enhancing the value of these solutions, leveraging Amazon Web Services cutting-edge technologies.

Jared Cook

Jared Cook

Jared Cook is a Senior Cloud Infrastructure Architect based in Dallas, Texas, where he has been part of the AWS Professional Services team for over 5 years. In his role, Jared leverages his expertise in Infrastructure as Code, DevOps, and Kubernetes to help customers design and deploy scalable, secure, and highly available cloud infrastructure on AWS. Jared is passionate about empowering organizations to maximize the benefits of the cloud through the adoption of modern software engineering practices and cloud-native technologies.

Robert Oikarinen

Robert Oikarinen

Robert Oikarinen is a Principal Engagement Manager with AWS Professional Services focused on supporting Automotive and Manufacturing customers. Based out of Detroit Michigan, he has been with Amazon over 9 years and has been focused on helping customers integrate IoT data with machine learning models.

Vladi Salomon

Vladi Salomon

Vladi Salomon is a Principal IoT Data Architect with Amazon Web Services. He has over 7 years of experience in IoT architecture across different verticals, including Industrial IoT (IIoT), Smart Home, Smart City, and Mining, as well as data warehousing and big data platforms. In recent years, he has focused on how to bring AI to IoT through scalable MLOps platforms. As a member of AWS Professional Services, he works with customers of various scales and industries, architecting and implementing a variety of end-to-end IoT solutions.