Networking & Content Delivery
Tenant routing strategies for SaaS applications on AWS
A key challenge for SaaS providers is designing secure, scalable tenant routing mechanisms to identify tenants and route requests to appropriate resources. Effective tenant routing ensures isolation, scalability, and security. This post explores strategies for routing HTTP requests in multi-tenant SaaS environments on AWS, including considerations, best practices, and example scenarios.
For routing strategies at the transport layer, see Approaches to Transport Layer Tenant Routing for SaaS.
Overview of tenant routing in SaaS
Tenant routing depends on your SaaS architecture model. In the pool model, you share infrastructure resources among multiple tenants. In this case, routing is not necessary as there is only a single resource serving all tenants. On the other hand, the silo model dedicates infrastructure resources per tenant. In this case, effective tenant routing becomes essential for routing incoming requests to tenant-specific resources.
The bridge model combines silo and pool models. This mix can be applied across tenants based on your tiering strategy, or microservices in your architecture, possibly using multiple routing mechanisms as a result.
Tenant routing strategies
There are two broad categories of tenant routing strategies:
- Domain-driven routing uses Domain Name System (DNS) to determine routing. SaaS providers can assign a subdomain or prefix per tenant to the domain name to serve the application.
- Data-driven routing uses information within incoming HTTP requests to determine routing. SaaS providers can leverage various HTTP headers, request parameters, or cookies to drive the routing logic.
Domain-driven routing
With domain-driven routing, you assign each tenant with a unique hostname. A common approach is to use a subdomain per tenant (e.g. tenant1.example.com). A key advantage of domain-driven routing is simplicity. By leveraging inherent DNS infrastructure, you efficiently obtain tenant context and routing simply from the hostname. However, it may lack flexibility and customization for complex scenarios, especially as the application evolves over time
Considerations and best practices
Vanity domains – To offer branding benefits, SaaS providers often allow tenants to select a custom, personalized vanity subdomain. For faster tenant onboarding, you may initially generate unique random vanity domains, and then allow customization later.
Bring Your Own Domain (BYOD) – To offer additional branding benefits, SaaS providers can offer tenants the ability to use an apex domain. However, this is less common due to added complexity. With subdomains, the SaaS provider controls domain ownership and TLS certificates. In contrast, apex domains may require tenants to manage their own domain. Consider trade-offs across branding benefits and additional processes for domain delegation, verification, and TLS certificates.
For example, you can automate TLS certificate renewals using a service, such as AWS Certificate Manager (ACM). This involves an initial email or DNS verification to prove domain ownership, and you typically request this during tenant onboarding. However, automation may not always be possible. For example, enterprise customers may require manual processes to maintain strict control over their domain and TLS certificates.
Tenant Directory – Domain-driven routing relies on users remembering their tenant URL. However, some users may forget or need additional guidance if they visit your SaaS solution directly. As such, consider creating an experience that allows users to identify their tenant based on their email address or user identifier. An example of such an experience is shown below.
Figure 2: Example user experiences that you can provide to your tenants as part of your SaaS solution
Example scenario with Amazon Route 53 and Application Load Balancer
Implementing domain-driven routing depends on the AWS service you select for routing incoming requests. In the example, you use Application Load Balancer and configure conditional logic based on the host header of the incoming request using listener rules. When provisioning each tenant, you configure routing such that you direct users to the appropriate tenant resource.
Figure 3: Example of domain-driven routing using listener rules in Application Load Balancer
DNS with Amazon Route 53 – HTTP requests use DNS as the initial entry point. In this example, you configure Amazon Route 53 with an alias DNS record such that it routes all traffic associated with the *.example.com subdomain to the Application Load Balancer.
Host-based listener rules with Application Load Balancer – As traffic routes to the load balancer, it contains an HTTP Host header. For example, tenant1.example.com. You then forward the request to a target corresponding to the resource dedicated to the tenant.
Tenant onboarding – Consider tenant routing as part of tenant onboarding. For example, when a new SaaS customer signs up, based on their tier, your tenant onboarding service can determine whether a new infrastructure needs to be provisioned. If so, you need to configure tenant routing in addition to provisioning the infrastructure. In this case, a host-based listener rule corresponding to the new tenant infrastructure.
Note that this is just one example of a domain-driven routing approach. While it uses Application Load Balancer, the concepts are applicable to other technology implementations.
Data-driven routing
Now, let’s take a look at data-driven routing. For SaaS applications, a key design principle is to bind user identity to tenant identity. This generates a SaaS identity that flows through every layer of your system, facilitating access to obtain tenant context. In data-driven routing, you can include this identity in components of the HTTP request. This can include headers, cookies, URL paths (or URIs), or the request body. Then, you extract this data to dynamically direct incoming requests to the appropriate tenant’s resources.
A key advantage of data-driven routing is flexibility. However, due to the need to perform additional computation for routing, the approach requires careful design due to additional complexity and operational overhead, particularly in large-scale SaaS environments.
Considerations and best practices
Tenant identity management – With data-driven routing, a common approach is to obtain tenant context upon user authentication, such as API keys, or OAuth client IDs. However, this approach assumes that the SaaS application has a centralized identity service. In some cases, SaaS applications may have separate identity services for each tenant. In such scenarios, you require tenant context prior to authentication to determine the appropriate identity service. As such, domain-driven routing may be a more suitable approach.
Wildcard subdomains – Automation is key for SaaS solutions, and as such, a popular approach is to provide each tenant with a subdomain. With wildcard subdomains, you configure DNS once without the need for additional DNS operations every time a new tenant is onboarded. For example, you configure Amazon CloudFront with a wildcard alternate domain *.example.com. The service will route any traffic matching the pattern to the origin where the application resides. You can then extract the tenant context in your application, and route the request to the appropriate tenant. Wildcard subdomains are supported by services such as Amazon API Gateway, Application Load Balancer and AWS Amplify.
Cost and performance optimization with caching – Executing routing logic incurs computation cost and performance overhead. As a result, consider minimizing the frequency of execution. For example, you execute routing logic only during the initial authentication process. And, subsequent requests from the same user or tenant can leverage the cached routing decision.
Advanced context – In some use cases, you may need additional tenant context beyond just the tenant identity. This could include references to backend resources, application versions, or geolocation data extracted from CloudFront headers, which can be particularly useful in multi-region architectures. In such scenarios, consider storing this rich tenant information in a low-latency key-value store, such as Amazon CloudFront KeyValueStore and Amazon DynamoDB. By doing so, you can efficiently fetch these details for routing.
Example Scenario with Amazon Route 53 and CloudFront
Like domain-driven routing, the implementation of data-driven routing depends on the tenant routing service you choose to handle incoming requests. In this example, you use CloudFront as the first entry point to the application. You configure routing logic at Lambda@Edge by obtaining the tenant context, for example, tenant-id
, to conditionally select an origin (a service) to forward the request to.
Figure 4: Example of data-driven routing using Lambda@Edge on CloudFront
Identity with Amazon Cognito – To obtain tenant context, you first authenticate the user by using an identity provider such as Amazon Cognito. Upon authentication, the identity provider issues a JSON Web Token (JWT). This token contains the tenant that the user belongs to. You then send the token as an authorization HTTP header within subsequent HTTP requests.
DNS with Amazon Route 53 – In data-driven routing, you might use a single domain, for example, www.example.com, to allow all your tenants to access your application and embed the tenant context in an HTTP header instead. In this example, you configure Amazon Route 53 with an alias DNS record such that it routes all traffic associated with www.example.com to the CloudFront distribution.
Auth-based routing with Lambda@Edge on CloudFront – As traffic is routed to your CloudFront distribution, it contains an HTTP authorization header containing the encoded JWT previously retrieved from the identity provider. The JWT includes the tenant context of the user, for example, tenant-id:tenant1
. You then configure Lambda@Edge to decode the JWT and extract the tenant context. To verify the JWT, you obtain the public key from the identity provider. Once verified, you forward the request to an origin that belongs to the tenant. For more details on this process, see authorization with Lambda@Edge and JWT.
Tenant onboarding – Consider tenant routing as part of tenant onboarding. For example, when you provision infrastructure for a new tenant, you can update Lambda@Edge routing logic to accommodate for the new tenant. A more scalable approach is to maintain mapping of tenants to origins in a low latency key-value datastore. You can refer to this in your Lambda@Edge logic to make routing decisions. When a new tenant is onboarded, you add a new entry to accommodate for the new route.
Implementation of tenant routing
Implementation of tenant routing varies depending on your SaaS architecture, in particular, the service you use for handling your SaaS application’s entry point. When selecting a service, consider its routing capabilities, alongside more general requirements such as performance, security, and cost.
For example, you may use CloudFront and Lambda@Edge not only to perform data-driven routing, but also to reduce latency and secure against DDoS attacks.
Another example is using HTTP reverse proxy. Its lightweight nature allows you to perform data-driven routing efficiently. This is common in a Kubernetes architecture, where you can additionally use a service mesh to manage authenticated requests and routing to tenants. To learn more, see SaaS Identity and Routing with Amazon EKS.
Another implementation approach is using API Gateway which offers wildcard custom domains and Lambda authorizers to support domain-driven and data-driven routing approaches. In addition, API Gateway is popular for SaaS applications allowing you to throttle per-tenant requests to minimize the impact of noisy neighbors and optimize performance.
Scalability and sharding considerations
Regardless of the tenant routing approach, scalability is a critical consideration as you grow and onboard more tenants.
Note that service quotas can set a limit on the maximum number of routes in your routing logic. For example, in the scenario with Application Load Balancer, you rely on listener rule per tenant. In the scenario with CloudFront, you rely on Lambda@Edge with quotas on functions per distribution and requests per second. In addition, CloudFront also has quotas for domain names and SSL certificates per distribution. These constraints can lead to varying architectural decisions.
One advanced approach to overcome these scale limitations is employing a cell-based architecture, where you separate tenants into divisions of shards. Here, you perform routing of the incoming request initially to the shard, and subsequently to the tenant within the shard.
Figure 5: Tenant routing in SaaS applications with resource sharding for pooled tenants
Conclusion
Tenant routing requires careful consideration when building SaaS applications. In this post, you explored two strategies: domain-driven and data-driven routing. The post also covered a range of approaches, from a simple subdomain per tenant to advanced dynamic routing based on the content of HTTP requests. When designing your tenant routing strategy, consider trade-offs across scalability, operations and cost. More importantly, ensure that it aligns with your SaaS model and works backward from your desired customer experience. By evaluating each approach and aligning with your business objectives, you can build a secure and scalable SaaS application that meets the evolving needs of your customers.
To learn more about building SaaS applications, visit the AWS Well-Architected SaaS Lens.