Networking & Content Delivery
Hybrid inspection architectures with AWS Local Zones
Customers often ask about hybrid security inspection architecture patterns for latency-sensitive applications, where they want to run their workloads inside of AWS Local Zones, to perform security inspection but without compromising latency. In this post, we share some hybrid inspection architectures with traffic flows, where both workloads and security inspection appliances run inside of the Local Zone, allowing customers to access their workloads from on-premises via AWS Direct Connect not only for low latency and consistent performance but also for performing traffic inspection. We also share some considerations and limitations.
In 2019, we launched our first AWS Local Zone in Los Angeles, a type of infrastructure deployment that places compute, storage, database, and other select AWS services close to large population and industry centers. Local Zones lets customers deliver applications that require very low latency (single-digit millisecond latency) or local data processing using familiar APIs and tool sets. Each Local Zone is a logical part of the corresponding parent AWS Region. Customers can extend their Amazon Virtual Private Cloud (Amazon VPC) by creating a new subnet that has a Local Zone assignment. When you create a subnet in a Local Zone, you extend the VPC to that Local Zone.
Many customers have on-premises data centers and connect to their workloads running in AWS via Direct Connect providing a private network connection between their facilities and AWS. In many circumstances, private network connections can reduce costs, increase bandwidth, and provide a more consistent network experience than Internet-based connections. In conjunction, they often use AWS Transit Gateway, a network transit hub, to connect their VPCs and on-premises networks over Direct Connect. In June 2022, AWS announced AWS Direct Connect support for all AWS Local Zones. Prior to this announcement, connectivity from a Direct Connect location to a workload running in a Local Zone (except for the Los Angeles Local Zones) would flow through the parent AWS Region of that specific Local Zone. This translated into an increased latency when you wanted a private and dedicated connection between your on-premises locations and Local Zones. With this announcement, network traffic takes the shortest path between Direct Connect locations and Local Zones, both for current and future ones. This new routing behavior reduces the distance that network traffic must travel, thereby decreasing latency and helping make your applications more responsive.
Customers also have hybrid security inspection requirements, such as deep packet inspection (DPI), application protocol detection, domain name filtering, and intrusion prevention system (IPS). They may use AWS native services (e.g., AWS Network Firewall) or third-party vendor virtual security appliances for workloads running in AWS Cloud via Gateway Load Balancer (GWLB), a managed service that makes it easy to deploy essential network protections for all of your workloads running in VPC. However, today these services consumed via VPC endpoints (powered by AWS PrivateLink) can’t be deployed inside a Local Zone subnet.
Local Zone hybrid inspection architectures
In this section, we discuss three inspection architectures and traffic flows for traffic between an on-premises data center and a Local Zone over Direct Connect. Today, AWS native security services, such as Network Firewall, GWLB and AWS Web Application Firewall (AWS WAF), aren’t yet supported in Local Zones. Therefore, we show using self-managed third-party security appliances/firewalls running on Amazon Elastic Compute Cloud (Amazon EC2). Using Network Firewall or GWLB in the parent Region can be used for inspecting Local Zone workloads, but this is an anti-pattern due to traffic hair-pinning through the parent Region first via Transit Gateway. This increases the network round-trip latency and diminishes the purpose of using a Local Zone. We also recommend reviewing the AWS Direct Connect Traffic Flow with AWS Local Zone reference architecture, to help understand traffic flows between on-premises and Local Zone through Direct Connect.
1) Third-party firewall instance with Amazon EC2 Auto Recovery
An important rule when building a highly-available and highly-reliable system is to design for failure. Although modern data centers, networks, and servers are highly reliable, they aren’t immune to failures and can experience occasional failures. In Amazon EC2, numerous system status checks monitor an instance and the other components that must be running in order for your instance to function as expected. Among other things, the checks look for loss of network connectivity, loss of system power, software issues on the physical host, and hardware issues on the physical host. Amazon EC2 is designed with an availability design goal of 99.950% for a single-Availability Zone (AZ) data plane. Refer to Appendix A: Designed-For Availability for Select AWS Services in the AWS Well-Architected Reliability Pillar documentation for more details. Amazon EC2 Auto Recovery is a feature available since 2015 and built into Amazon EC2 to automatically recover an instance if it becomes impaired due to an underlying hardware issue. As of 2022, we announced that automatic recovery will be enabled by default for supported instance types. This makes it even easier for customers to recover their instance when it becomes unreachable.
In the following figure, the self-managed security appliance/firewall instance is running in a Local Zone on Amazon EC2 in a separate Firewall Subnet with the default auto-recovery feature enabled (represented with label (A) in the diagram). The App EC2 instance running in the App Subnet represents a latency sensitive application running inside of Local Zone, which is also connected via Direct Connect to the on-premises data center. All traffic to/from the on-premises data center traversing the App EC2 must be inspected via the firewall instance.
Note that each EC2 instance performs source/destination checks by default. This means that the instance must be the source or destination of any traffic it sends or receives. However, a firewall instance must be able to send and receive traffic when the source or destination is not itself. Therefore, you must disable source/destination checks on the firewall instance.
Packet walkthrough
The following steps describe a packet walkthrough:
Step 1: Traffic from on-premises network 172.16.0.0/16 destined for the application hosted in the Local Zone App subnet 10.0.16.0/24 (App EC2) traverses the private virtual interface (VIF) and enters the VPC through the Virtual Private Gateway (VGW).
Step 2: The Edge Route Table associated with the VGW is evaluated, which contains a more specific route to the App subnet 10.0.16.0/24 pointing to the Elastic Network Interface (ENI) of the firewall instance as the target. The traffic is sent to the ENI of the firewall instance for inspection.
Step 3: After inspection, the Firewall Subnet Route Table associated with the firewall subnet is evaluated, which contains a local route to 10.0.0.0/16. Traffic is sent to the App EC2.
Step 4: For the return traffic from the App EC2 to the on-premises network 172.16.0.0/16, the App Subnet Route Table associated with the App subnet in the Local Zone is evaluated which contains a route to 172.16.0.0/16 pointing to the ENI of the firewall instance as target. Traffic enters the firewall instance for inspection.
Step 5: After inspection of the return traffic, the Firewall Subnet Route Table associated with the firewall subnet is evaluated which contains a route to 172.16.0.0/16 pointing to the VGW as target.
Step 6: At the VGW, return traffic is sent toward on-premises through the private VIF.
Pros:
- There are no extra charges for the Amazon EC2 Auto Recovery feature. Therefore, this is a cost-effective solution if you’re looking at utilizing a third-party firewall instance for security inspection.
- The simplified automatic recovery feature is enabled by default on supported instance types. If your firewall uses one of the supported instance types, then no additional configuration is needed, which reduces complexity. If the firewall uses an unsupported instance type, then the default configuration doesn’t enable automatic recovery. Check the Set the recovery behavior section in our Amazon EC2 documentation for the steps to enable automatic recovery.
- When the automatic recovery event succeeds, you’re notified by an AWS Health Dashboard event. When an automatic recovery event fails, you’re notified by an AWS Health Dashboard event and by email.
Cons:
- Amazon EC2 Auto Recovery only works when there is a failure with the underlying hardware (system status checks fail). If the system status checks pass but the instance status checks fail for some reason, then Amazon EC2 Auto Recovery doesn’t automatically recover your instance. When an instance status check fails, you typically must address the problem yourself (for example, by rebooting the instance or by making instance configuration changes).
- The Amazon EC2 control plane runs in the parent region associated with the Local Zone. Therefore, in case of a network connectivity issue between the Local Zone and the parent region, the Amazon EC2 Auto Recovery feature will fail to recover the instance automatically.
- There are limitations associated with the Amazon EC2 Auto Recovery feature. Review the Limitations section to learn more.
- Your firewall instance may be temporarily unavailable to process traffic when the Amazon EC2 Auto Recovery feature is recovering your instance from a failure. This process typically takes a few minutes and your application traffic passing through the firewall might be blackholed during this time.
2) Third-party firewall instances in High Availability with automatic failover using built-in vendor capabilities
This scenario leverages two firewall instances (Active/Standby firewalls) running in a high-availability pair in the Local Zone, as shown in the following two figure. In this high-availability mode, firewalls are generally in a cluster and their configuration and session information are synchronized. Therefore, they’re highly dependent on specific vendor capabilities (such as GRE/IPsec tunnels, heartbeat polling, hello messages etc.), to do seamless failover in the event that a peer goes down. Check with your firewall vendor to determine the capabilities. You might also have to disable source/destination checks on the firewall instances to enable it to route traffic between the VGW and the App subnet. If the Standby firewall detects problems reaching the active firewall, then it automatically calls the Amazon EC2 APIs (ReplaceRoute API) to update the routes in the Edge route table and the App subnet route table to point to the ENI of the Standby firewall as the target. After this update, traffic is directed to the Standby firewall for inspection, until the Active firewall is reachable again. The firewall’s custom health checks and the Amazon EC2 APIs are represented with label (A) in the following diagrams.
Packet walkthrough
These steps outline a packet walkthrough:
Step 1: Traffic from on-premises network 172.16.0.0/16 destined to the application hosted in the Local Zone App subnet (App EC2) traverses the private VIF and enters the VPC through the VGW.
Step 2: The Edge Route Table associated with the VGW is evaluated which contains a more specific route to the App subnet 10.0.16.0/24 pointing to the ENI (eni-1234) of the Active firewall instance as the target. Traffic is sent to the ENI of the Active firewall instance for inspection.
Step 3: After inspection, the Firewall Subnet Route Table associated with the firewall subnet is evaluated which contains a local route to 10.0.0.0/16. Traffic is sent to the App EC2.
Step 4: For the return traffic from the App EC2 to the on-premises network 172.16.0.0/16, the App Subnet Route Table associated with the App subnet in the Local Zone is evaluated which contains a route to 172.16.0.0/16 pointing to the ENI (eni-1234) of the Active firewall instance as the target. Traffic enters the Active firewall instance for inspection.
Step 5: After inspection of the return traffic, the Firewall Subnet Route Table associated with the firewall subnet is evaluated which contains a route to 172.16.0.0/16 pointing to the VGW as target.
Step 6: At the VGW, return traffic is sent toward on-premises through the private VIF.
In the event of failure of the Active firewall instance, the Standby firewall uses the Amazon EC2 APIs to update the route for the App subnet 10.0.16.0/24 in the Edge Route Table and the route for on-premises network 172.16.0.0/16 in the App Subnet Route Table to point to the ENI (eni-5678) of the Standby firewall as the target. Traffic between the App subnet and the VGW now traverses the Standby firewall as shown in the previous figure. The remainder of the traffic flow remains unchanged.
Pros:
- This solution involves firewall vendor custom health checking mechanisms to verify the liveliness of the other firewall instance. Based on the mechanism used, you may have the ability to configure the health check interval and timeout values, which gives you complete control over the failover times. Make sure that you consult the specific vendor capabilities before using this approach.
- With the help of firewall vendor custom health checking mechanisms (heartbeats, etc.), you can detect all types of failures with the firewall instances (system status checks or the instance status checks failures).
Cons:
- Not all firewall vendors support custom health checking mechanisms. Check your preferred firewall vendor’s documentation to learn if they support this configuration.
- Running more than one firewall instance (in high availability) involves additional costs with this solution. You may need additional licenses (such as to enable a BGP process, IPsec capabilities) on the firewall instances for the health checking mechanisms. Check your firewall vendor documentation to learn more.
- Customers are responsible for the additional health-checking configuration and maintenance on the third-party firewall instances, which can add to complexity and operational overhead.
3) Third-party firewall instances in high availability with automatic failover using AWS Lambda
This scenario leverages two self-managed firewall instances (Active/Standby) running as a high availability pair in the Local Zone on Amazon EC2, as shown in the following two figures. A custom automation via the AWS Lambda function running in the AWS Region’s parent subnet(s) can be configured to periodically check the liveliness of both firewall instances. These liveliness checks can be through Lambda-initiated health checks (such as HTTP/HTTPS/TCP-based) or you can also leverage the instances’ Amazon CloudWatch health metrics (system and instance status checks). If the Active firewall starts failing health checks, then the Lambda function uses the Amazon EC2 APIs to update the routes in the Edge route table and the App subnet route table to point to the ENI of the Standby firewall as the target. After this update, traffic is directed to the Standby firewall for inspection until the Active firewall is reachable again. Note that you may have to disable source/destination checks on the firewall instances to enable it to route traffic between the VGW and the App subnet. The Lambda function health checks, CloudWatch, and the Amazon EC2 APIs are represented with label (A) in the following diagrams.
Packet walkthrough
These steps describe a packet walkthrough:
Step 1: Traffic from on-premises network 172.16.0.0/16 destined to the application (App EC2) hosted in the Local Zone App subnet traverses the private VIF and enters the VPC through the VGW.
Step 2: The Edge Route Table associated with the VGW is evaluated which contains a more specific route to the App subnet 10.0.16.0/24 pointing to the ENI of the Active firewall instance as the target. Traffic is sent to the ENI (eni-1234) of the Active firewall instance for inspection.
Step 3: After inspection, the Firewall Subnet Route Table associated with the firewall subnet is evaluated which contains a local route to 10.0.0.0/16. Traffic is sent to the App EC2.
Step 4: For the return traffic from the App EC2 to the on-premises network 172.16.0.0/16, the App Subnet Route Table associated with the App subnet in the Local Zone is evaluated which contains a route to 172.16.0.0/16 pointing to the ENI (eni-1234) of the Active firewall instance as target. Traffic enters the Active firewall instance for inspection.
Step 5: After inspection of return traffic, the Firewall Subnet Route Table associated with the firewall subnet is evaluated which contains a route to 172.16.0.0/16 pointing to the VGW as target.
Step 6: At the VGW, return traffic is sent toward on-premises through the private VIF.
In the event of failure of the Active firewall instance, the Lambda function updates the route for the App subnet 10.0.16.0/24 in the Edge Route Table and the route for on-premises network 172.16.0.0/16 in the App Subnet Route Table to point to the ENI (eni-5678) of the Standby firewall as the target. Traffic between the App subnet and the VGW now traverses the Standby firewall as shown in the previous figure. The remainder of the traffic flow remains unchanged.
Pros:
- When you use Lambda-initiated health checks (such as HTTP/HTTPS/TCP-based) to verify the liveliness of your firewall instances, the health check intervals and timeouts are configurable. Therefore, you have control over the failover times.
- With Lambda-initiated health checks, you can detect failures with the firewall instances (system status checks or the instance status checks failures).
Cons:
- When running more than one firewall instance (in high availability) and Lambda function for health-checking and route table updating purposes, there are additional costs involved with this solution. For details on Lambda costs, please visit our Lambda Pricing page.
- Customers are responsible for configuration and maintenance of the Lambda functions, which can add to the complexity and operational overhead.
- The Lambda function runs in the parent Region associated with the Local Zone, so the automation can add to latency and thus can contribute to health check/failover time to the Standby firewall instance.
- Lambda functions currently don’t support ICMP-based health checks. You may have to use a TCP-based health check (such as HTTP/HTTPS/custom TCP port) instead.
Considerations
Virtual Private Gateway (VGW): VGW supports Gateway route tables, where you can associate a route table for fine-grain control over the routing path of traffic entering your VPC from on-premises. However, you can’t route traffic from a VGW to a GWLB or Network Firewall endpoint regardless of the Local Zone. However, you can target an ENI. Note the Rules and considerations.
- Application Local Balancer (ALB) is supported in Local Zone and typically used with workloads in conjunction with Amazon EC2 Auto Scaling to maintain application availability. However, ALB IP addresses are dynamic and can change when ALB scales. Therefore, you can’t assign a static IP address to ALB. This mean that for hybrid security inspection scenarios mentioned earlier, VGW can’t target an ALB IP to direct incoming traffic from on-premises to the security appliances inside Local Zone.
- AWS WAF integrates natively with ALB and provides protection against common Layer 7 attack patterns including web exploits, bots, SQL injection, cross-site scripting (XSS), and others. Currently, AWS WAF doesn’t support ALB running inside of the Local Zone.
- Network Load Balancer (NLB) provides static IP (one per-AZ), but NLB isn’t supported in Local Zone today. Therefore, you can’t use NLB to route traffic from VGW to the security appliances inside of Local Zone.
- GWLB helps you easily deploy, scale, and manage your third-party virtual appliances. However, you can’t create VPC endpoints (and GWLB) inside of Local Zone subnets today.
Amazon EC2 Auto Scaling: Amazon EC2 now performs automatic recovery of instances by default. However, automatic recovery isn’t initiated for instances inside of an Auto Scaling group. Note the limitations.
Lambda: Lambda isn’t currently supported in Local Zone. In the architectures shown earlier, Lambda runs in the parent Region which could introduce some latency while performing security appliance health checks and failover automation.
Direct Connect Redundancy: It is always recommended to have redundant Direct Connect connections from at least two diverse colocation facilities. Refer to AWS Direct Connect Resiliency Recommendations for details.
At the time of writing this blog, the only Local Zones that support edge association with virtual private gateway (VGW) are us-west-2-lax-1a, us-west-2-lax-1b (Los Angeles, US), and us-west-2-phx-2a.(Phoenix, US).
Conclusion
Local Zones let you use select AWS services, such as compute and storage services, closer to more end-users, thereby providing them with very low latency access to the applications running locally. Direct Connect provides an alternative to using the Internet to connect to AWS via a private network connection between your facilities and AWS. This post discussed some of the hybrid security inspection architecture patterns that help you design and deploy third-party virtual appliances in Local Zones for traffic inspection, while also allowing access from on-premises via Direct Connect. The following table shows a summary comparison between the options discussed earlier:
Option | Cost | Failure Recovery Time | Solution Complexity | Operational Complexity |
1) Firewall instance with Amazon EC2 Auto Recovery | Low | Medium | Low | Low |
2) Firewall instances in high availability with failover using vendor capabilities | Medium | Low | Medium | Medium |
3) Firewall instances in high availability with failover using Lambda | Medium | Low | High | High |
To get started and learn more, refer to the following resources:
- Local Zones, Direct Connect, and Lambda documentation.
- Direct Connect Traffic Flow with Local Zone Reference Architecture.