AWS Partner Network (APN) Blog
VTEX built a cost-per-tenant strategy E-Commerce platform on AWS
By Diogo Filipe Dornelas Falcão, Staff Data Engineer, VTEX
By Caio Vinicius Canic Silva, Senior Analytics Engineer VTEX
By Jose Augusto Ferronato, Solutions Architect, AWS
By Tiago Reichert, Sr. GTM SSA, Containers, AWS
VTEX, a global e-commerce company serving over 3400 customers across 38 countries, has undergone a transformative journey since its inception in 2000. Evolving from a B2B textile software to a cloud-native, microservices-based e-commerce platform, VTEX has prioritized efficiency, resilience, and innovation. As part of the journey to a modernized SaaS application, VTEX realized it is essential to manage costs while delivering seamless shopping experiences is essential.
This blog post explores VTEX’s project to manage cost in a multi-tenant application, providing visibility of cost at individual tenant level, enabling a deeper understanding and control of expenses associated with each customer on the platform.
Understanding the Challenge
In VTEX’s e-commerce platform, a tenant refers to a distinct retailer or customer using the platform to manage their own products, customers, and transactions. All tenants share the same underlying infrastructure (pooled model), which poses a challenge: accurately determining and managing the costs associated with each tenant. This is essential for optimizing investments and ensuring financial sustainability. Read the AWS Well-Architected SaaS Lens for more details about this subject.
A multi-tenant architecture benefits from aggregated demand, aiding scalability, as various retailers use the same infrastructure to sell their products. However, each tenant has a unique set of data and usage patterns, making precise cost allocation critical (security and privacy controls are a key concern considered but out of scope for this article). This issue goes beyond basic operational concerns, involving the accurate calculation of infrastructure, services, and support costs for each tenant to ensure that the platform remains financially viable and competitive.
Accurate cost per tenant is fundamental for financial visibility, resource optimization, and fair pricing. Financial Viability ensures that the platform can recover expenses and maintain competitiveness. Resource Optimization helps identify opportunities to optimize how resources are consumed by each tenant. Fair Pricing supports fair cost distribution based on tenant usage, which drives innovation and efficiency.
Solution Architecture
To address the challenges of cost allocation in pooled environments, applications must provide detailed telemetry on tenant consumption. Figure 1 illustrates how tenant usage can be correlated with overall infrastructure costs to calculate the cost per tenant
Figure 1: High-level diagram on tenant-aware cost allocation
VTEX considered the following metrics to implement their tenant-aware cost allocation:
- Data Traffic: Tracks the volume of data exchanged between the platform and tenants, including API requests, responses, and interactions with internal and external services. It is influenced by the number of sessions, orders, and the complexity of queries.
- Storage: Measures the amount of data stored for each tenant, covering product, customer, and order information. Factors affecting storage include catalog size, customer count, and update frequency across internal systems.
- Backend Hosting: Resources allocated for customization and extensibility, such as CPU, memory, and network usage for hosting applications and APIs. Influencing factors are the volume of requests, latency, and the performance of the code.
- Storefront Consumption: Resources used to serve storefronts, including store rendering and CDN for web pages and static assets. It is influenced by visitor numbers, page design, and asset optimization.
These metrics involve multiple data sources that need to be mapped accurately, such as internal proxy logs, AWS Cost Usage Reports (CUR), and other internal information. Figure 2 delve into the architectural decisions VTEX made to manage and analyze these data sources effectively.
The architecture shown below represents the flow of information within VTEX. Data is gathered from a variety of sources, including VTEX platform services and AWS services, then stored in an Amazon Redshift cluster. Additionally, business rules are implemented to trigger alerts and engage the appropriate teams whenever predefined thresholds are exceeded.
Figure 2: VTEX tenant-aware cost allocation architecture
Data Collection
A comprehensive metrics framework was built using three key types of data, crucial for understanding costs, measuring tenant performance, and analyzing platform usage. These data sources provide a holistic view of the platform’s financials, performance, and usage. This approach enables tenant-aware cost allocation and informed decision-making across the e-commerce ecosystem.
Proxy Metric for Tenant Platform Usage
A proxy metric was developed to gauge tenant usage of the platform, focusing on latency and the volume of requests to the apps and APIs. This composite metric provides a detailed view of tenant interactions and usage patterns.
Traffic is captured through VTEX’s proprietary cloud API gateway, allowing for the development of consumption-oriented metrics and insights into traffic from each account. The API gateway also monitors tenant usage beyond border traffic. Logs are collected by an API request router and stored in an Amazon OpenSearch, where they are pre-aggregated and filtered before being sent to Redshift.
Total Platform Cost
AWS Cost and Usage Reports (AWS CUR) enables advanced data transformations and the creation of metrics by providing detailed and accurate cost attribution with product tags. This level of detail and precision makes CUR essential for effective resource allocation. CUR report generation is automated for all VTEX accounts, and the data is imported into Redshift. In Redshift, costs are analyzed, grouped, and filtered using AWS service tags, allowing for expense breakdowns by application and team.
Sellers Metrics
Metrics designed to capture various facets of tenant performance, including the number of orders, sessions, and Gross Merchandise Volume (GMV). Analyzing these metrics offers deep insights into operational effectiveness.
These metrics are ingested in near real-time using an in-house solution with AWS services such as AWS Lambda, Amazon Kinesis, and Amazon Redshift. Cost and revenue metrics are further refined by incorporating data such as contract price from the internal billing system, accessed via Amazon Redshift Federated Query to query a transactional database on Amazon RDS.
Data Processing and Storage
Amazon Redshift is central to VTEX’s data processing and storage strategy. All the described data is ingested into Redshift, which integrates with other essential tables within the Redshift cluster, as well as with other AWS services. Within Redshift, VTEX runs Extract-Load-Transform(ELT) processes on this data, resulting in a set of gold tables. These tables are then used with Amazon QuickSight for visualization or in a raw format for daily reports utilized by multiple teams.
Tenant Margin and Score
VTEX evaluates the margin health per tenant by analyzing the platform cost associated with each tenant in relation to their contact prices and revenue generated. This analysis involves correlating these financial metrics to generate a score that reflects the profitability and overall financial health of each tenant.
Data Visualization
The transformed and aggregated data from Redshift becomes the foundation for the Amazon QuickSight dashboards. These dashboards is a visualization tool that offers rich insights into the cost, performance, and usage patterns of VTEX’s multi-tenant platform. QuickSight enables the creation of customized tables, views, and visualizations, distilling complex data into intuitive representations.
Figure 3 and Figure 4 shows an example of how VTEX analyzes data using the solution. The Figure 3 represents the overall costs, while the Figure 4 shows a filtered view for a specific tenant.
Figure 3: Overview of all tenants with filters for specific tenants
Figure 4: Additional dimensions (such as cost per order) filtered by tenant
Conclusion
By implementing a solution for calculating cost per tenant, VTEX has ensured financial viability, promoted resource optimization, and fostered a collaborative environment with its clients. The integration of AWS services and the development of detailed metrics and dashboards have empowered VTEX to make better-informed strategic decisions, guiding the company’s trajectory. This approach has also alignment between clients and business goals through shared insights, and led to more efficient platform utilization and decreased expenses from data-driven optimization.
.
VTEX – AWS Partner Spotlight
VTEX is an AWS Advanced Technology Partner and AWS Competency Partner for Retail, are a leading e-commerce and marketplace platform provider, offering a suite of solutions for businesses to create, manage, and grow their online retail operations. VTEX is known for its flexibility, allowing businesses to customize their storefronts and integrate with various third-party services. The platform supports both B2C and B2B commerce models and has gained recognition for its innovative approach to collaborative commerce, helping businesses create their own ecosystems of sellers and partners.