Your telecom cloud journey on AWS: Part 2 – A technical roadmap with AWS

Introduction

In “Blog 1 – Your Telecom Cloud Journey on AWS: Part 1 – Establishing a Foundation” we covered the importance of establishing a strong cloud foundation on AWS, and we identified some of the key capabilities that are different or need to be adapted for Telco use cases. Specifically, we can see that from a technical perspective there are three key areas that need careful consideration to achieve a well-architected solution for Telcos: Networking and Domain Name System (DNS), Security, and Observability (covered in “Blog 3 – Cloud adoption journey: Operational Practices”).

This post covers the main patterns that we observe with existing Communication Service Providers (CSPs) that illustrate the technical requirements needed for Telco workloads. This enables you to better understand the target architectures and considerations when planning a move to cloud with AWS.

Networking and DNS

One of the primary topics for Telco workloads on AWS is networking. When deploying workloads, a hybrid interconnect architecture needs to be carefully planned to ensure that the current networking design on-premises is able to interconnect with the AWS-based environment. Network segmentation, routing, IP addressing, and zoning are some of the factors that need to be evaluated in the design of your network. For better understanding, it would help to start with some discovery questions, such as:

Locations of existing DC and PoPs?
Which AWS Region and AWS Direct Connect locations should be used?
What application is planned on the AWS Telco cloud, is there a special network configuration?
What is the total bandwidth or throughput requirements?
Is network segmentation and zoning of traffic required?
Do we use IPv4, IPv6 address, or both?
Are there overlapping IP addresses or a limited amount of address prefixes?
Where is the egress point to the internet (AWS or on-premises)?
How are networking components to be managed or monitored? Can we offload them to AWS?
Do we need transitive routing or traffic to or from service virtual IP (VIP) addresses?

These questions help you discover and understand what AWS services you need to build your Telco cloud network on AWS and identify considerations for your AWS network design. AWS offers a number of networking services and features that can address the variety of requirements users have when building a Telco cloud.

During this section we cover some the unique requirements for Telco networks that need to be considered when adopting cloud.

Shared Networks account

A Shared Networks accounts is one of the foundational accounts in a Telco cloud environment. It is where you establish and control the networking traffic and resources. This account acts as a central hub for network routing, made possible by AWS services such as AWS Transit Gateway. Transit Gateway is a highly available and scalable service used to consolidate the Amazon Virtual Private Cloud (Amazon VPC) routing configuration for a Region with a hub-and-spoke architecture. Using Transit Gateway you can control the routing between VPCs and routing between AWS and on-premises.

Moreover, there are one or more Direct Connects on a Shared Networks account. Direct Connect is a networking service that provides private network connection between Amazon VPCs and on-premises. It’s a physical connection established through a Direct Connect provider. Over a Direct Connect link you can create one or multiple Virtual Interfaces (VIF) separated by VLAN (802.1q) tags. There are three types of VIFs that you can create: Private VIF, Public VIF, and Transit VIF. Each VIF or VLAN is a network segment that connects on-premises to AWS. The type of VIF is linked to the AWS service that is used: a Transit VIF for Transit Gateway and a private and public VIF for a Virtual Private Gateway (VPGW).

Another main component of the interconnect is the AWS Direct Connect Gateway (DXGW). AWS DXGW is a service that distributes routing information across its connections. It can be associated with a VPG through a private VIF or associated with a Transit Gateway through a Transit VIF. A DXGW is a globally available resource that you can connect to any AWS Region globally. DXGW behaves like a BGP route reflector from AWS to your on-premises network. It is responsible for advertising routes (CIDR prefixes on AWS) to your on-premises network and bridge the traffic from your on-premises network to a VPC on AWS through Transit Gateway or VPG.

A Shared Networks account is typically owned and operated by the networks team, the same team who manages your on-premises networking.

Figure 1: Shared Networks account with spoke accounts

The preceding figure shows an example of a Shared Networks account functioning as a hub. It has an interconnect with on-premises, and two application accounts.

Network segmentation

Telco networks are traditionally made up of network segments with virtual routing and forwarding (VRFs). These network segments are isolated and have their own security controls and defined traffic types. Traffic is separated through these network segments, such as signaling, diameter, voice, data, operational and maintenance (O&M) traffic. The count of network segments can sometimes be in the double-digits, thus adding complexity to the network. One of the main considerations in deploying Telco workloads on AWS is how to achieve similar multiple network segmentation on AWS or how to extend multiple existing network segments on-premises to AWS.

One of the solutions is using Multi-VPC ENI attachments. Multi-VPC elastic network interface (ENI) is a VPC feature that supports an Amazon Elastic Compute Cloud (Amazon EC2) instance (running network functions or a virtual appliance) to have multiple separate network interfaces where the network interface belongs to a distinct VPC. For readability let’s call these distinct VPCs as VRF-VPCs. Each of the VRF-VPCs represents an individual network segment extended from an on-premises VRF as VLANs over a Direct Connect through the DXGW. These VRF-VPCs are sitting on the Shared Networks services account. The subnets on the VRF-VPCs are shared to the application accounts using AWS Resource Access Manager (AWS RAM). Application accounts can create network interfaces from those subnets and attach them as additional interfaces on an EC2 instance.

In the following example, the Shared Networks account has a DX link to on-premises networks. Over Direct Connect, multiple Transit VIFs are established using a DXGW and a Transit Gateway. Each of the TVIFs is interconnecting a single VRF on-premises to a single VPC on the Shared Networks account, essentially stretching a VRF to AWS. The ENIs on the VRF-VPCs would be the network interfaces used by the network file system (NFS) running on the application accounts to access the desired VRF network, made possible by multi-VPC ENI attachment.

Figure 2: Shared Networks account with with VRF segments

The preceding design pattern is referenced from the architecture patterns in this post but modified to fit into a Landing Zone architecture with a shared-networks account. This post outlines other patterns that can be considered to achieve VRF segmentation from on-premises to AWS.

Packet data network breakout

Data network breakout or user plane breakout of network workloads is another consideration on a network design. There are two options where you can breakout the data network through AWS using an Internet Gateway (IGW) or send it back to on-premises and use your existing Internet Service Provider (ISP) connection. An IGW is a horizontally scaled, redundant, and highly available VPC component that allows communication between the VPC and the internet. It supports IPv4 and IPv6 traffic. When you use IGW for breakout, you have the option to use AWS public addresses (IPv4 or IPv6) or you can choose to bring your own public addresses to AWS.

User plane breakout to public data network (such as internet) through IGW

In the following architecture, the network workload VPC is attached to the Transit Gateway of the Shared Networks account. The network workload has a dedicated subnet and uses the IGW as the data network breakout. The workload has one or more assigned Public IP address (such as Elastic IP Address). The Elastic IP address is used as the source address to send or receive traffic from the internet.

Figure 3: Shared Networks account with application account via IGW breakout

The CSP’s on-premises network

From the following architecture, the user plane traverses back to the on-premises network before breaking out to the internet. Some user would prefer this approach to use their existing internet connection and other data network applications residing on-premises.

Figure 4: Shared Networks account with application account via On-premises breakout

The preceding different patterns are referenced from this post but modified to fit into a Landing Zone architecture with a Shared Networks account.

DNS on AWS

Amazon Route 53 Resolver
Amazon Route 53 provides a service to create and manage public and private DNS zones on AWS. Route 53 Resolver responds recursively to DNS queries from AWS resources for public records, Amazon VPC-specific DNS names, and Route 53 private hosted zones, and is available by default in all VPCs. This provides the ability to resolve local VPC DNS names (such as EC2 instance names), domains attached through private hosted zones (covered in the following sections) and public domains.

Hybrid DNS with Route 53
As mentioned previously, Route 53 Private Hosted Zones can be created and attached to VPCs, allowing the private resolution of user managed DNS names within the VPC. Users migrating workloads into a multi-account architecture require the ability to take this further and resolve DNS names between AWS, on-premises and between AWS accounts.

AWS has a best practice multi-account DNS architecture, which can be used to achieve a hybrid DNS architecture where some domains reside on-premises and in AWS with the ability to resolve domain names in both environments:

Figure 6: DNS architecture AWS and on premises

A Central DNS VPC is established in a DNS account (another account could be reused such as Network Services), this VPC has its own DNS resolver that resides at the VPC CIDR + 2.
Route 53 Resolvers are used in the central DNS VPC to forward (outbound resolver) or receive (inbound resolver) DNS requests from on-premises, for high availability these are created over multiple Availability Zones (AZs).

For inbound resolution from on-premises:

Private hosted zones are created as sub-domains in each application account and associated to the central DNS VPC. In this case the DNS VPC has known of the app1.aws.example.com DNS Zone when receiving inbound requests.
An inbound request to web1.app1.aws.example.com is forwarded to the Inbound Resolver in the DNS VPC.
The Inbound Resolver is able to resolve DNS requests locally as the DNS zone was associated in Step 3.

For outbound resolution on-premises:

DNS Forwarding Rules are created in the DNS account and shared using AWS RAM to the application account. This forwarding rule is associated to the application account VPC instructing any requests to on-premise.example.com or aws.example.com to be forwarded to the outbound resolvers.
A DNS request is triggered from the application account for web1.on-premise.example.com, and the VPC Resolver forwards this to the outbound resolver.
The outbound resolver forwards requests to the on-premises DNS servers for resolution.

Security

The Security Pillar describes how to use cloud technologies to help protect data, systems, and assets in a way that can improve an AWS user’s security posture. It provides in-depth, best practice guidance for architecting secure workloads on AWS. The security pillar is made of seven design principles to help strengthen Telco cloud security:

Implement a strong identity foundation
Enable traceability
Apply security at all layers
Automate security best practices
Protect data in transit and at rest
Keep people away from data
Prepare for security events

At AWS, security is job zero and an important consideration for any CSP migration to the public cloud, this section intends to summarize the key areas that should be considered when adopting cloud.

Landing Zone
The Landing Zone is the entry point for application teams deploying workloads on AWS and can be used to simplify the implementation of security guardrails across all AWS accounts. AWS Organizations is used in a multi-account architecture to create logical grouping of accounts called Organizational Units (OUs). These OUs can be targeted for preventative guardrails called Service Control Policies (SCPs) allowing for the control of permissions. Applying these SCPs is made easier through the use of AWS Control Tower, which is a managed service for establishing a Landing Zone that has a large number of out-the-box controls to be applied.

AWS has a number of security tools such as AWS CloudTrail, AWS Config, Amazon Macie, AWS Security Hub, Amazon GuardDuty, AWS IAM Access Analyzer, and AWS Firewall Manager. These can be enabled and managed at an organizational level, help preside over all accounts, and provide centralized visibility. Tools such as AWS Config can be used to implement detective controls, identifying non-compliant configurations and taking remediation or notification actions.

Identity and Access Management
Access to AWS APIs such as the AWS Management Console, AWS Command Line Interface (AWS CLI), and SDK should be least privilege, and access to multiple AWS Accounts can be achieved through AWS Identity and Access Management (IAM) Identity Center, allowing permissions to be managed centrally and used to integrate with external identity systems such as Active Directory, Okta, and OneLogin. It’s advisable to create a break-glass role that can be used to access AWS in cases where external identity systems are unreachable. Non-human roles should be regularly audited to validate if they are actively being used, and processes should be in place to remove roles that are not in use.

Data protection

Once the Landing Zone is established, the infrastructure and application related to the workload are deployed where data needs to be protected in use, in transit, and at rest:

Data in process: used to describe the ability to protect sensitive data in use by encrypting it while it is being processed by the compute. AWS offers a number of solutions to protect data in process with the use of AWS Nitro Instance Types. The AWS Nitro System is the underlying platform for all modern AWS compute instances. With Nitro Instances, there is no mechanism for any system or person to log in to Amazon EC2 servers, read the memory of EC2 instances, or access any data on encrypted Amazon Elastic Block Store (Amazon EBS) volumes. In addition, AWS offers instances with memory encryption enabled. These are graviton instances such as AWS Graviton2, AWS Graviton3, instances with third-generation Intel Xeon Scalable processors (Ice Lake), such as M6i instances, and fourth-generation Intel Xeon Scalable processors (Sapphire Rapids), such as M7i instances, and instances with third-generation AMD EPYC processors (Milan), such as M6a instances, and fourth-generation AMD EPYC processors (Genoa), such as M7a instances.
Data in transit: used to describe the protection of data after a connection is established and authenticated against potential attackers. A combination of techniques can be used to achieve this. At the application level, encryption in transit can be achieved by using and enforcing services such as TLS for communication. At the network level traffic can be encrypted end-to-end using IPSec or through the built in encryption offered by Nitro System, the underlying Nitro System hardware automatically encrypts in-transit traffic between Nitro instances. And at the interconnect level, using dedicated Direct Connect connections with MACSec enabled.
Data at rest: used to describe the protection of data when it is stored. All data should be encrypted at rest and AWS provides a number of key management solutions to achieve this such as AWS Key Management Service (AWS KMS), AWS CloudHSM, and AWS KMS External Key Store. AWS KMS keys have integration with most AWS services to make it easy to encrypt data, and this should be enforced using SCPs or non-compliance detected with AWS Config.

Infrastructure security
Within the AWS cloud, patterns can be implemented that make sure of greater security and segmentation, some of these are:

Establish patterns for VPCs that are secure, for example patterns for public and private workloads. A workload that requires public internet access has dedicated public subnets that route traffic to the VPC IGW. Private subnets should have access outbound to the internet through the NAT Gateway, but no inbound access.
Use VPC Endpoints to establish access to AWS Service APIs to ensure access is not through the internet.
Use Transit Gateway or AWS Cloud WAN to create different route tables/segments and control access between these segments.
Protect against volumetric and application attacks using AWS WAF and AWS Shield Advanced.
Use of Route 53 Resolver Firewall to filter malicious domains from being resolved through the VPC DNS Server.
Using AWS Security Groups and Network ACLs (NACL) to restrict traffic to a workload or subnet.
Use AWS Network Firewall or a third Firewall to east/west or north/south inspection of traffic.
Use Direct Connect or AWS Site-to-Site VPNs to establish private connectivity to on-premises.
When using Direct Connect, consider using dedicated connection and using MACSec to implement link-layer encryption.
Using the DXGW to segment VIFs if necessary. If VIFs share the same DCGW, then its considered the same as having a single route domain.

CI/CD security
Automation with continuous integration/continuous development (CI/CD) and/or Infrastructure-as-Code (IaC) has the potential to reduce the risk and increase the speed of changes on AWS. The following patterns improve security:

Integrate linting and security scanning within pipelines to make sure of continuous validation of code/image quality and security.
Add approval steps to pipelines to allow validation of changes before its promoted to a production environment, IaC tools such as Terraform and AWS CloudFormation can show the changes that are executed.
Use Amazon Elastic Container Repository (Amazon ECR) native capabilities to scan container images for vulnerabilities continuously, on-push, or on-demand, thus making sure that vulnerable images.

Logging
Establishing an effective observability architecture is important detective control to quality detect, alert, and mitigate security events. Some of the key best practices are:

Audit API activities across all accounts, use CloudTrail to simplify the collection of these logs and consolidate them into a central account.
Protect CloudTrail logs from malicious activity, such as deletion, by establishing guardrails.
Setup alerts and if relevant automatically remediate security events.

Summary

When planning for a cloud adoption with AWS, it’s crucial to consider the unique technical requirements derived from Telco workloads. This post emphasizes the importance of carefully designing and implementing robust networking and DNS solutions, as well as implementing strong security measures across all layers, aligning with Telco use cases. Essential concepts and patterns, such as centralized networking, network segmentation techniques, secure Landing Zones, and data protection need to be approached from a Telco perspective. Where similar solutions already exist on-premises, they should be adapted to support these patterns, or newly created if they do not exist. By meticulously planning and implementing these patterns tailored to your specific requirements, you can achieve the flexibility, scalability, and resilience necessary to support a diverse range of Telco workloads on AWS, built upon a solid cloud foundation. This deliberate approach enables you to use the full benefits of the cloud while maintaining the high standards and compliance mandates of the Telco industry.

AWS for Industries