AWS for Industries
How Vodafone is using AWS and Broadband Forum User Service Platform (USP) standard to re-architect the management of its Customer Premise Equipment (CPE) and become more adaptive to changes
Vodafone Group Plc [VOD] is a global communications company offering a combination of mobile, fixed, TV, Internet-of-Things (IoT), and cloud & security services to consumers and enterprises. With network presence in many countries worldwide, Vodafone serves more than 330M mobile subscribers and over 28M fixed broadband users in Europe.
Like many other Telcos, Vodafone has been traditionally managing their CPEs with a single controller approach. Now Vodafone is transitioning to a multi-control architecture where each controller can be any end-user application or solution. The partitioning of scope and responsibilities into multiple controllers is crucial to evolving Vodafone’s CPEs into user service platforms. New applications can now be implemented and rolled-out more independently and without affecting existing controllers like the traditionally used Auto Configuration Servers (ACS). This post discusses the overall benefits and architecture implemented by Vodafone and how User-Services-Platform (USP/TR-369) notification messages can be handled in a serverless manner using AWS IoT Core. Based on a typical Telco industry scenario, it describes a telemetry data pipeline that is triggered when a USP Agent running on a CPE sends a Boot! event. Message routing leverages IoT Rules Engine’s new protobuf decoding function, an AWS Lambda function is used for payload transformation, and finally JSON data is persisted in Amazon Simple Storage Service (Amazon S3) via an Amazon Kinesis Data Firehose delivery stream.
Solution overview
Vodafone has traditionally been managing their CPEs with a single controller approach using SNMP and TR-069 (CPE WAN Management Protocol). Broadband Forum standardized the new TR-369 or USP protocol (https://usp.technology/) as a natural evolution of TR-069. Vodafone is transitioning to the new TR-369 standard to turn their CPEs into an application platform. Highly scalable and fast communication between the CPEs and multiple controllers is crucial to realizing this. With the former N:1 relationship between devices and controller, whenever a service provider needed information from a device, it had to collect it from the single controller, which technically is a communication bottleneck and organizationally it created dependencies that made service delivery in an autonomous way impossible. Moreover, the session-oriented nature of TR-069 requires the establishment of the connection with the CPE in first place. Now with a N:M relationship between devices and controllers it becomes critical to architect the communication in a way that it can scale with the number of devices and controllers, it increases data transfer efficiency and the connections between devices and controllers are always on.
In the new architecture, a controller can be any end-user application or solution. Vodafone decided to build on AWS to take advantage of its cloud being easy to use, cost-effective, scalable and secure. This would allow Vodafone to become more adaptive to changes as new services can be deployed faster and then scale flexibly as per the demand. It enables Vodafone to focus on functional feature delivery and delegate non-functional aspects like reliability and scalability to AWS. To realize this new communications architecture, Vodafone leverages AWS IoT Core and other AWS serverless services.
Remote CPE management comes with limitations due to its age and history, which its successor TR-369 is trying to address. The biggest difference is the so called “multi controller paradigm”. It allows the partitioning of scope and responsibilities into multiple controllers, a familiar concept and core principle of micro-service architectures. The following illustration shows the agent controller relationship, where one or more controllers are interconnected by possibly different message transfer protocols (MTP) supported by USP.
Figure 1: USP Agent and Controller Architecture [Src: BBF USP (TR-369) Specification]
Vodafone uses MQTT to exchange messages between the endpoints in the platform. The USP Agent deployed on the CPE is based on OB-USP-A, which is open-source and driven by the BBF. There are multiple USP Controller involved in the solution, all fully developed in-house. One is needed for CPE onboarding (Bootstrapping), another deals with Telemetry data, and another is used by a native mobile application (Android, iOS) to just name a few examples.
Solution
The main benefits of the solution we describe are first, the possibility to enable multiple controllers for the CPEs allowing a partition of responsibilities and scope. Each controller can support different use cases, which gives enhanced flexibility and resiliency. Being able to better modularize the solution with clear boundaries and responsibilities is especially important for a large enterprise like Vodafone. Second, it enables the CPE as a Service Platform, so that new services (or capabilities) can be offered via a combination of software modules running on the CPEs together with the rich capabilities that USP offers like , advanced device recognition, managed wifi, security services and others.The combination of new services together with the optimization of existing ones (i.e. enhanced reliability and faster troubleshooting) will increase customer satisfaction and allow Vodafone to further differentiate from market competitors and unlock new sources for revenue. Third, as the solution has been built in-house, and the new architecture is based on serverless technologies/services that can easily be coupled with other services, it’s highly adaptive to changes in load, resilient to failures, and it can be rapidly evolved to meet new business needs when compared to legacy architectures.
This is why Vodafone’s “Home Technology Products” department set up a group of cross-functional teams according to the Scaled agile framework (SAFe). CPE Engineering and Backend Architecture experts, as well as other subject matter experts, started working on the design and implementation of a new CPE Management Platform. This is fully USP/TR-369-powered and still compatible with legacy infrastructure using TR-069 (Hybrid Architecture) for backwards compatibility reasons. After one year, the first USP enabled CPE was released in the UK, and the backend platform went live serving a growing base of customers.
In one part of the overall Vodafone solution, which is the focus of this post, a CPE pushes USP notification messages in the Protobuf format into AWS IoT Core for several types of events. Messages are then decoded to the JSON format using AWS IoT Core Rules Engine. Based on the message payload, the IoT Rules route the message to either an Amazon Simple Queue Service (Amazon SQS) queue to be further processed by a micro-service backend, or to an AWS Lambda function that will enrich the event with meta-information and forward it into an Amazon Kinesis Data Firehose stream. This data is persisted in an Amazon S3 bucket that is used to later ingest the data into Big Data pipelines for analytics purposes. The following picture shows the corresponding architecture and flow.
Figure 2: Solution Architecture
Authentication of CPEs
Authentication of CPEs is performed based on device-specific certificates embedded during manufacturing time, leveraging AWS IoT Core’s “Just-in-Time-Registration (JITR)” method. When the device is booted up for the first time, it connects to a pre-defined MQTT Broker address, which validates the presented certificate against a list of trusted Certificate Authorities (CAs). Then the Lambda function handling these platform onboarding requests assigns the according communication policies, allowing agents and controllers to use specific topics to securely exchange messages. As a result, Vodafone is implementing USP’s “Trusted Broker Model” based on AWS IoT Core.
AWS IoT Core Rules
AWS IoT Core securely connects CPE devices, decodes protobuf messages into JSON and routes the decoded messages to AWS services without the need for Vodafone to manage the underlying infrastructure. The AWS IoT Rules Engine native SQL decode function we use is similar to the one describe at https://dev.to/iotbuilders/decoding-usp-protobuf-data-sent-by-usp-agents-using-the-recent-aws-iot-rules-engine-native-sql-decode-function-1ja0. In the context of the implementation that is the subject of this post, two rules are needed. Both rules have a select clause that includes the MQTT topic used by the agent to send USP notifications to a controller. The first rule identifies the so called “Boot!” event message and sends it to an AWS Lambda function. The second rule applies to all other USP NOTIFY types and enqueues the message in Amazon SQS so that downstream services can consume them in a loosely coupled way.
Both rules require AWS IoT Core’s new protobuf decoding feature, as message routing can only be done based on the event payload. The following SQL statement describes the rule to route a Boot! event:
The action defined for this rule is to forward the canonical JSON representation of the message to a Lambda function. The decoded payload has the following structure:
An “inverted” rule is used to steer all other event types to Amazon SQS to have them further processed by a set of micro-services.
Protobuf files
Protobuf provides the CPE with an efficient yet well-structured compression mechanism. The built-in protocol documentation makes data serialization and deserialization more manageable than JSON. However, both producer and consumer must operate on a defined shared schema to encode and decode it properly.
USP Records and Messages are defined as part of the BBF’s standardization process. Protobuf data files are open-source and available here:
- https://github.com/BroadbandForum/usp/blob/master/specification/usp-record-1-2.proto
- https://github.com/BroadbandForum/usp/blob/master/specification/usp-msg-1-2.proto
In its current state, the Rules Engine only supports up to two decode functions in a single SQL statement – see here. This requires the above two proto files to be merged into one. Otherwise, a nested double decode is needed in both the SELECT statement and WHERE clause, which wouldn’t allow for the implementation of decoding and message routing in a single rule.
Amazon SQS
Amazon SQS offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components.
In the above scenario, Amazon SQS is providing extended retention of “important” USP messages (such as the OnboardRequest notification) that requires a high processing guarantee. Although AWS IoT Core retains a maximum of one message per topic, this isn’t sufficient for our case. Additionally, Amazon SQS helps to more loosely couple USP agents from controller and protect downstream systems from being overloaded by notifications. Both aspects are increasing the overall reliability of the solution.
AWS Lambda
AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.
We utilize AWS Lambda to validate the received input and enrich the Boot! message via meta-information about the event that is needed by the Big Data pipeline processing the event later. Successfully handled requests are written into an Amazon Kinesis Firehose stream.
The AWS Lambda function is implemented in Golang and the outputted target data format is defined by the following struct:
Depending on the client configuration, the agent could request to receive a USP notification response, which the AWS Lambda function handles. If the send_resp flag is set to true, then it publishes a USP message back to the agent topic to acknowledge that the notification has been successfully received and suppress the retry mechanism employed by the Agent. It uses AWS’s Golang SDK to publish the response to the MQTT Broker.
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.
PUT operations to add data to Amazon S3 are costly. Our solution reduces the total amount of Amazon S3 PUT requests by at least half by adding a Firehose delivery stream between Lambda and Amazon S3 that buffers multiple events. Moreover, it only flushes them on Amazon S3 if either a time or data volume limit is exceeded. With this approach, several events are written to Amazon S3 in a single PUT operation, which makes the overall processing more cost efficient.
The cost savings achieved with this approach increase as we scale up the CPE base and add support for other events. This is due to the fact that there will be a higher probability of multiple events in the Firehouse delivery stream being batched together.
Amazon S3
Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
In this solution, Amazon S3 plays the role of a temporary data store, which makes data accessible for an ETL pipeline that ingests it to a Vodafone internal data lake. The actual information persisted for a received Boot! event is JSON encoded and looks as follows:
The further processing of the event used to generate business intelligence isn’t in scope of this post.
Conclusion
Using USP/TR-369 allows Vodafone to implement remote CPE management in a more flexible and agile way than it was possible before. Communication with devices is based on a well-defined industry standard that supports state-of-the-art integration patterns. It ultimately transforms CPEs into “service hubs”, which provide customers access to new services and capabilities. This post describes some of the foundations needed to achieve that goal.
By implementing the USP backend using AWS managed services, Vodafone can deliver a scalable, reliable, cloud-native CPE management platform.