Networking & Content Delivery
Improve web application availability with CloudFront and Route53 hybrid origin failover
Earlier this year, we released technical guidance regarding three advanced design patterns for highly available applications using Amazon CloudFront and Amazon Route 53. In this post, we dive deeper into CloudFront origin failover, Amazon Route 53 DNS failover, and the hybrid origin failover approach to further enhance the availability of your web applications. We also provide an AWS Cloud Development Kit (AWS CDK) solution that you can use to implement and test different high-availability patterns.
Origin Failover feature in CloudFront
CloudFront allows customers to configure primary and secondary origins within an origin group, and specify the HTTP error codes that trigger a failover. When CloudFront receives the configured HTTP error code from the primary origin as a response (e.g., server error or server unreachable), it will attempt the original request with the secondary origin.
This native CloudFront failover is stateless. CloudFront doesn’t track the state of the origin’s health. Therefore, all incoming requests initially get routed to the primary origin. The response from the origin must first time-out or return an HTTP status code configured for failover before CloudFront attempts the request with the secondary origin in the group. As a consequence, although this failover is immediate, it introduces latency. Furthermore, note that while you can configure the cache behavior to allow other methods, CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET
, HEAD
, or OPTIONS
.
Route 53 DNS Failover
Alternatively, you can leverage Route 53 Failover Routing Policies with Health Checks to implement a stateful failover mechanism for your origin. In this scenario, Route 53 responds to DNS queries for the origin domain name with IP records of the primary origin when it’s detected as healthy. If it becomes unhealthy and the secondary is healthy, then Route 53 automatically updates and responds with the secondary IP record. Note that if both the primary and secondary origins are unhealthy, then it returns the primary IP record.
Failover delay depends on the health check’s polling interval and failure threshold. Note that your service/application could be unavailable during the transition to an unhealthy state. The failure threshold and polling interval are adjustable within the health check settings and should be configured according to your application’s requirements.
In summary, CloudFront Origin Failover fails over immediately when it detects a failure from the origin. However, it may also introduce latency as it tries to forward every request to the primary origin first.
Route53 DNS Failover offers more stability, but it requires more time to detect failure from the origin. However, you can combine both solutions to increase availability without affecting performance.
Hybrid CloudFront and Route 53 failover for better availability
The following solution uses Route 53 to configure a Failover Policy that covers both of your origins with a single origin domain name. Next, it sets up a CloudFront origin group with the previously created domain name as the Primary, and your backup endpoint as the Secondary origin.
The advantage in this setup is that, during the minutes required by Route 53 to detect failure and it failing over, CloudFront’s origin failover feature will immediately retry requests against the secondary origin to increase the application availability. As mentioned earlier, this pattern will only work with the GET
, HEAD
, or OPTIONS
HTTP methods.
The solution will achieve the following:
- Create an API Endpoint using Amazon API Gateway and AWS Lambda on both the Primary and Backup Regions (with custom domain name + certificate)
- Create a Route 53 health check for both API Endpoints
- Create a Route 53 DNS entry, with an Alias for both the Primary and Secondary API Endpoint
- Create two (2) CloudFront Distributions with the following setup:
- Setup 1: Configured with Route 53 failover DNS record as Origin
- Setup 2: Configured with Origin failover group. Route 53 failover DNS record as primary and secondary API gateway as a fallback
- Export both CloudFront distributions’ domain names to let you test both solutions
Prerequisites
For this walkthrough, you should have the following:
- An AWS Account
- A public domain hosted on Amazon Route 53
- Permissions to create origin records and health checks in Route 53
- Permissions to create or update AWS Identity and Access Management (IAM) roles, AWS Certificate Manager (ACM) public certificates, CloudFront distributions, API Gateway configurations, and Lambda functions in two different regions
- AWS CDK Toolkit installed
npm install -g aws-cdk
Deployment
The deployment of the solution will take approximately 10 minutes.
- We start by downloading the CDK template from our GitHub repository.
git clone https://github.com/aws-samples/cloudfront-hybrid-origin-failover.git
cd cloudfront-hybrid-origin-failover
- Install CDK and the required dependencies.
npm install -g aws-cdk
npm install
- Deploy the stack to your Primary and Fallback Region.
./deployment/deploy.sh AWS_REGION AWS_BACKUP_REGION DOMAIN_NAME HOSTED_ZONE_ID
You must input the following required arguments:
AWS_REGION
: Define your Primary RegionAWS_BACKUP_REGION
: Define your Fallback RegionDOMAIN_NAME
: This stack requires that you have a public domain name hosted on Amazon Route53. Provide your domain nameHOSTED_ZONE_ID
: This stack requires that you have a public domain name hosted on Amazon Route53. Provide your Hosted Zone ID
Deployment example
./deployment/deploy.sh eu-west-1 us-east-1 mydomain.com Z0XXXXXXXXXXXX
- At the end of the deployment, the FQDN of the two created CloudFront distributions will be exported as an AWS CloudFormation output:
- CloudFront Distribution with Route53 failover DNS record as origin
- Export Name = R53-Failover-Distrib-Domain
- CloudFront Distribution with Hybrid Route53 Failover with CloudFront Origin Failover
- Export Name = Hybrid-Failover-Distrib-Domain
- CloudFront Distribution with Route53 failover DNS record as origin
Outputs:
CdkRegionStack.HybridFailoverDistribDomain = https://XXXXXXX.cloudfront.net/prod
CdkRegionStack.R53FailoverDistribDomain = https://YYYYYYYY.cloudfront.net/prod
In addition to the terminal’s output, you can find the exported outputs on CloudFormation’s console by selecting the created stack and navigating to the Outputs tab.
Solution testing
To test both failover solutions, you could use the following bash script. You must provide the previously exported CloudFront Distribution URL
- To start testing, you must execute the following script:
./testing/test.sh https://<R53Failover/Hybrid-CF-Distrib>.cloudfront.net/prod
- To simulate the failure of the Primary node, you can change the status code returned by the primary API endpoint through the Lambda console:
- In your primary region, locate the Lambda function that was created by the stack. It starts with
CdkCloudFrontFailover-PRIMARYappContentHandler
. - Navigate to the Configuration tab, then locate the Environment variables. Edit the
StatusCodeVar
variable from200
to502
for instance.
- In your primary region, locate the Lambda function that was created by the stack. It starts with
Testing example with Route 53 DNS Origin Failover:
./testing/test.sh https://<R53Failover-CF-Distrib>/prod
---------------------------------------------
req# | status | timestamp | statusCode | TTFB
---------------------------------------------
1,PRIMARY,2022-11-12 00:43:29,200,0.126647
2,PRIMARY,2022-11-12 00:43:30,200,0.132606
...
33,PRIMARY,2022-11-12 00:44:06,200,0.134515
34,DOWN,2022-11-12 00:44:08,502 <-- Changed Lambda status code to 502
35,DOWN,2022-11-12 00:44:09,502
...
114,DOWN,2022-11-12 00:45:44,502
115,DOWN,2022-11-12 00:45:45,502
116,SECONDARY,2022-11-12 00:45:46,200,0.394419 <-- R53 failover kicked (82 seconds)
117,SECONDARY,2022-11-12 00:45:48,200,0.395510
118,SECONDARY,2022-11-12 00:45:49,200,0.389113
...
Note that it takes time for the Route 53 failover to kick in. This is the time required by the health check to mark the endpoint as unhealthy.
Testing example with CloudFront Hybrid Origin Failover:
Before running the test again, you should first rollback StatusCodeVar
variable of the Lambda to 200
.
./testing/test.sh https://<Hybrid-Failover-CF-Distrib>/prod
---------------------------------------------
req# | status | timestamp | statusCode | TTFB
---------------------------------------------
1,PRIMARY,2022-11-12 00:55:19,200,0.168231
2,PRIMARY,2022-11-12 00:55:20,200,0.123779
...
14,PRIMARY,2022-11-12 00:55:34,200,0.069994
15,SECONDARY,2022-11-12 00:55:36,200,0.827250 <-- Changed Lambda status code to 502
16,SECONDARY,2022-11-12 00:55:37,200,0.486308
17,SECONDARY,2022-11-12 00:55:39,200,0.421217
18,SECONDARY,2022-11-12 00:55:40,200,0.497715
19,SECONDARY,2022-11-12 00:55:42,200,0.490106
20,SECONDARY,2022-11-12 00:55:43,200,0.494140
...
Testing will demonstrate how CloudFront Origin Failover will help maintain the application’s availability during the time required by Route 53 to failover to the secondary node.
Clean Up
- Destroy the stack from your Primary and Fallback Region:
./deployment/destroy.sh AWS_REGION AWS_BACKUP_REGION DOMAIN_NAME HOSTED_ZONE_ID
- Confirm from the outputs that the stack was successfully destroyed on both regions:
✅ CdkCloudFrontFailover: destroyed
Destroy example:
./deployment/destroy.sh eu-west-1 us-east-1 mydomain.com Z0XXXXXXXXXXXX
Conclusion
The availability and stability of customer-facing workloads is critical for maintaining a positive user experience for customers. To achieve high availability, we can introduce redundancy in the origin infrastructure. In this post, you learned how to leverage the included solution to setup two CloudFront distributions with different failover mechanisms that you can test to observe how using a combination of CloudFront and Route 53’s capabilities can make sure of high availability for your workloads without affecting application performance.
Costs and further reading
Basic testing of the provided AWS CDK will cost under $10/month.
- Amazon Route 53 pricing
- Three advanced design patterns for high available applications using Amazon CloudFront
- Enhanced origin failover using Amazon CloudFront and AWS Lambda@Edge