Networking & Content Delivery

Improve web application availability with CloudFront and Route53 hybrid origin failover

Earlier this year, we released technical guidance regarding three advanced design patterns for highly available applications using Amazon CloudFront and Amazon Route 53. In this post, we dive deeper into CloudFront origin failover, Amazon Route 53 DNS failover, and the hybrid origin failover approach to further enhance the availability of your web applications. We also provide an AWS Cloud Development Kit (AWS CDK) solution that you can use to implement and test different high-availability patterns.

Origin Failover feature in CloudFront

CloudFront allows customers to configure primary and secondary origins within an origin group, and specify the HTTP error codes that trigger a failover. When CloudFront receives the configured HTTP error code from the primary origin as a response (e.g., server error or server unreachable), it will attempt the original request with the secondary origin.

Figure 1. This diagram illustrates how CloudFront origin failover works.

Figure 1. This diagram illustrates how CloudFront origin failover works.

This native CloudFront failover is stateless. CloudFront doesn’t track the state of the origin’s health. Therefore, all incoming requests initially get routed to the primary origin. The response from the origin must first time-out or return an HTTP status code configured for failover before CloudFront attempts the request with the secondary origin in the group. As a consequence, although this failover is immediate, it introduces latency. Furthermore, note that while you can configure the cache behavior to allow other methods, CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET, HEAD, or OPTIONS.

Route 53 DNS Failover

Alternatively, you can leverage Route 53 Failover Routing Policies with Health Checks to implement a stateful failover mechanism for your origin. In this scenario, Route 53 responds to DNS queries for the origin domain name with IP records of the primary origin when it’s detected as healthy. If it becomes unhealthy and the secondary is healthy, then Route 53 automatically updates and responds with the secondary IP record. Note that if both the primary and secondary origins are unhealthy, then it returns the primary IP record.

Figure 2. This diagram illustrates how Route 53 DNS failover could be used with CloudFront (healthy primary)

Figure 2. This diagram illustrates how Route 53 DNS failover could be used with CloudFront (healthy primary)

Figure 3. This diagram illustrates how Route 53 DNS failover could be used with CloudFront (primary unhealthy)

Figure 3. This diagram illustrates how Route 53 DNS failover could be used with CloudFront (primary unhealthy)

Failover delay depends on the health check’s polling interval and failure threshold. Note that your service/application could be unavailable during the transition to an unhealthy state. The failure threshold and polling interval are adjustable within the health check settings and should be configured according to your application’s requirements.

In summary, CloudFront Origin Failover fails over immediately when it detects a failure from the origin. However, it may also introduce latency as it tries to forward every request to the primary origin first.

Route53 DNS Failover offers more stability, but it requires more time to detect failure from the origin. However, you can combine both solutions to increase availability without affecting performance.

Hybrid CloudFront and Route 53 failover for better availability

The following solution uses Route 53 to configure a Failover Policy that covers both of your origins with a single origin domain name. Next, it sets up a CloudFront origin group with the previously created domain name as the Primary, and your backup endpoint as the Secondary origin.

The advantage in this setup is that, during the minutes required by Route 53 to detect failure and it failing over, CloudFront’s origin failover feature will immediately retry requests against the secondary origin to increase the application availability. As mentioned earlier, this pattern will only work with the GET, HEAD, or OPTIONS HTTP methods.

Figure 4. This diagram illustrates how CloudFront origin failover and Route 53 failover work together.

Figure 4. This diagram illustrates how CloudFront origin failover and Route 53 failover work together.

The solution will achieve the following:

  • Create an API Endpoint using Amazon API Gateway and AWS Lambda on both the Primary and Backup Regions (with custom domain name + certificate)
  • Create a Route 53 health check for both API Endpoints
  • Create a Route 53 DNS entry, with an Alias for both the Primary and Secondary API Endpoint
  • Create two (2) CloudFront Distributions with the following setup:
    • Setup 1: Configured with Route 53 failover DNS record as Origin
    • Setup 2: Configured with Origin failover group. Route 53 failover DNS record as primary and secondary API gateway as a fallback
  • Export both CloudFront distributions’ domain names to let you test both solutions

Prerequisites

For this walkthrough, you should have the following:

Deployment

The deployment of the solution will take approximately 10 minutes.

  1. We start by downloading the CDK template from our GitHub repository.
git clone https://github.com/aws-samples/cloudfront-hybrid-origin-failover.git
cd cloudfront-hybrid-origin-failover
  1. Install CDK and the required dependencies.
npm install -g aws-cdk
npm install
  1. Deploy the stack to your Primary and Fallback Region.
./deployment/deploy.sh AWS_REGION AWS_BACKUP_REGION DOMAIN_NAME HOSTED_ZONE_ID

You must input the following required arguments:

  • AWS_REGION: Define your Primary Region
  • AWS_BACKUP_REGION: Define your Fallback Region
  • DOMAIN_NAME: This stack requires that you have a public domain name hosted on Amazon Route53. Provide your domain name
  • HOSTED_ZONE_ID: This stack requires that you have a public domain name hosted on Amazon Route53. Provide your Hosted Zone ID

Deployment example

./deployment/deploy.sh eu-west-1 us-east-1 mydomain.com Z0XXXXXXXXXXXX
  1. At the end of the deployment, the FQDN of the two created CloudFront distributions will be exported as an AWS CloudFormation output:
    • CloudFront Distribution with Route53 failover DNS record as origin
      • Export Name = R53-Failover-Distrib-Domain
    • CloudFront Distribution with Hybrid Route53 Failover with CloudFront Origin Failover
      • Export Name = Hybrid-Failover-Distrib-Domain

Outputs:

CdkRegionStack.HybridFailoverDistribDomain = https://XXXXXXX.cloudfront.net/prod
CdkRegionStack.R53FailoverDistribDomain = https://YYYYYYYY.cloudfront.net/prod

In addition to the terminal’s output, you can find the exported outputs on CloudFormation’s console by selecting the created stack and navigating to the Outputs tab.

Figure 5. This screenshot illustrates CloudFormation's Stack outputs.

Figure 5. This screenshot illustrates CloudFormation’s Stack outputs.

Solution testing

To test both failover solutions, you could use the following bash script. You must provide the previously exported CloudFront Distribution URL

  1. To start testing, you must execute the following script:
./testing/test.sh https://<R53Failover/Hybrid-CF-Distrib>.cloudfront.net/prod
  1. To simulate the failure of the Primary node, you can change the status code returned by the primary API endpoint through the Lambda console:
    • In your primary region, locate the Lambda function that was created by the stack. It starts with CdkCloudFrontFailover-PRIMARYappContentHandler.
    • Navigate to the Configuration tab, then locate the Environment variables. Edit the StatusCodeVar variable from 200 to 502 for instance.
Figure 6. This screenshot illustrates how to change the value of the StatusCodeVar environment variable.

Figure 6. This screenshot illustrates how to change the value of the StatusCodeVar environment variable.

Testing example with Route 53 DNS Origin Failover:

./testing/test.sh https://<R53Failover-CF-Distrib>/prod
---------------------------------------------
req# | status | timestamp | statusCode | TTFB
---------------------------------------------
1,PRIMARY,2022-11-12 00:43:29,200,0.126647
2,PRIMARY,2022-11-12 00:43:30,200,0.132606
...
33,PRIMARY,2022-11-12 00:44:06,200,0.134515
34,DOWN,2022-11-12 00:44:08,502      <-- Changed Lambda status code to 502
35,DOWN,2022-11-12 00:44:09,502
...
114,DOWN,2022-11-12 00:45:44,502
115,DOWN,2022-11-12 00:45:45,502
116,SECONDARY,2022-11-12 00:45:46,200,0.394419 <-- R53 failover kicked (82 seconds)
117,SECONDARY,2022-11-12 00:45:48,200,0.395510
118,SECONDARY,2022-11-12 00:45:49,200,0.389113
...

Note that it takes time for the Route 53 failover to kick in. This is the time required by the health check to mark the endpoint as unhealthy.

Testing example with CloudFront Hybrid Origin Failover:
Before running the test again, you should first rollback StatusCodeVar variable of the Lambda to 200.

./testing/test.sh https://<Hybrid-Failover-CF-Distrib>/prod
---------------------------------------------
req# | status | timestamp | statusCode | TTFB
---------------------------------------------
1,PRIMARY,2022-11-12 00:55:19,200,0.168231
2,PRIMARY,2022-11-12 00:55:20,200,0.123779
...
14,PRIMARY,2022-11-12 00:55:34,200,0.069994
15,SECONDARY,2022-11-12 00:55:36,200,0.827250 <-- Changed Lambda status code to 502
16,SECONDARY,2022-11-12 00:55:37,200,0.486308
17,SECONDARY,2022-11-12 00:55:39,200,0.421217
18,SECONDARY,2022-11-12 00:55:40,200,0.497715
19,SECONDARY,2022-11-12 00:55:42,200,0.490106
20,SECONDARY,2022-11-12 00:55:43,200,0.494140
...

Testing will demonstrate how CloudFront Origin Failover will help maintain the application’s availability during the time required by Route 53 to failover to the secondary node.

Clean Up

  1. Destroy the stack from your Primary and Fallback Region:
./deployment/destroy.sh AWS_REGION AWS_BACKUP_REGION DOMAIN_NAME HOSTED_ZONE_ID
  1. Confirm from the outputs that the stack was successfully destroyed on both regions:
✅ CdkCloudFrontFailover: destroyed

Destroy example:

./deployment/destroy.sh eu-west-1 us-east-1 mydomain.com Z0XXXXXXXXXXXX

Conclusion

The availability and stability of customer-facing workloads is critical for maintaining a positive user experience for customers. To achieve high availability, we can introduce redundancy in the origin infrastructure. In this post, you learned how to leverage the included solution to setup two CloudFront distributions with different failover mechanisms that you can test to observe how using a combination of CloudFront and Route 53’s capabilities can make sure of high availability for your workloads without affecting application performance.

Costs and further reading

Basic testing of the provided AWS CDK will cost under $10/month.

About the Authors

Chakib Sahraoui

Chakib Sahraoui is a Senior Technical Account Manager based in Paris. He provides advocacy and guidance to help customers plan, build and operate solutions using AWS best practices. Chakib is also passionate about Edge Services and how it helps customers delivering secure, reliable, and fast online content.

Abhinav Bannerjee

Abhinav is a Solutions Architect based out of Texas. He works closely with customers across industries to help them scale their businesses using Amazon Web Services. He is also focused on helping customers make the most of AWS Edge services for content acceleration and perimeter protection.