Improve cost visibility of an Amazon RDS multi-tenant instance with Performance Insights and Amazon Athena

Multi-tenancy allows service providers to share a single instance of an application and its infrastructure among multiple customers. However, although this model offers benefits in resource efficiency and scalability, it poses challenges in cost management and allocation.

Service providers often want to maximize efficiency by grouping multiple tenants within the same database instance and accurately allocating costs to individual tenants, which requires data that is often difficult to get in a shared environment. The main problem is determining how much of the database resources each tenant consumes when multiple tenants share a DB instance.

In this post we introduce a solution that addresses a common challenge faced by many customers: managing costs in multi-tenant applications, particularly for shared databases in Amazon Relational Database Service (Amazon RDS) and Amazon Aurora. This solution uses Amazon RDS Performance Insights and AWS Cost and Usage Reports (CUR) to addresses this challenge. This allows for efficient grouping of tenants within the same RDS or Aurora instances, while helping you implement accurate chargeback models, optimize resource-intensive workloads, and make data-driven decisions for capacity planning.

What is Performance Insights?

Performance Insights is an Amazon RDS feature that provides advanced performance monitoring and analysis for your database. It offers a set of metrics about your database load that you can retrieve using the AWS SDK or AWS Command Line Interface (AWS CLI), or visualize in the user-friendly dashboard.

Performance Insights allows you to break down the load by waits, SQL statements, database hosts, and users, and spot performance bottlenecks at a glance. Performance Insights gives you a granular understanding of the current database usage and potential performance issues.

In this solution, we use the AWS SDK to perform the following actions:

Get the data of the load metric from Performance Insights, sliced by the database users
Aggregate and visualize data to join the information with the CUR dataset

Performance Insights is available for the majority of database engines supported by Amazon RDS and Aurora (see Supported Regions and Aurora DB engines for Performance Insights).

Solution overview

The proposed solution depends upon a number of components, as illustrated in the following diagram.

architectural diagram

The workflow consists of the following steps:

An AWS Lambda function collects Performance Insights metrics every hour. Metrics are already segmented by database users; we provide additional details later in this post.
The Lambda function saves Metric data in an Amazon Simple Storage Service (Amazon S3) bucket as Parquet files, organized by AWS account, year, month, day, and hour.
CUR stores detailed billing data in Amazon S3 as compressed CSV files, organized in a time-based partitioned structure for efficient querying and analysis.
An AWS Glue crawler updates the Amazon Athena table schema and partitions hourly.
Athena queries and joins the data from both Performance Insights and CUR to attribute costs to specific database users (tenants)
Amazon QuickSight integrates with Athena which allows users to visualize SQL query results from Amazon S3 data and create interactive dashboards and reports.

A critical aspect of the solution is the use of the db.load metric from Performance Insights to evaluate database utilization. We chose this metric for the following reasons:

It represents the average number of active sessions for the DB instance, providing a high-level view of database activity
It’s normalized against the number of vCPUs, making it comparable across different instance sizes and types
It helps you understand if you’re approaching instance limits, or if your instances is under-utilized
It’s available across the majority of Amazon RDS database engines at the time of writing, allowing you to port the solution to your databases

The db.load metric is different from the other time-series metrics because you can break it into subcomponents called dimensions. You can think of dimensions as “slice by” categories for the different characteristics of the db.load metric.

In this solution, we slice the db.load category by user and join those results with the data from CUR, providing a comprehensive view of both resource utilization and associated costs per tenant. This enables more accurate cost attribution, helps identify optimization opportunities, and supports data-driven capacity planning decisions.

As we previously mentioned, there can be various models of multi-tenancy, where resource segregation can vary at different levels such as user, database, schema, or even in a pool model. Each model has its own advantages and considerations in terms of data isolation, security, maintenance, and resource allocation.

In this post, we assume that tenants have separate database users, and the cost allocation is distributed accordingly. However, you can apply this same concept to different multi-tenancy models as well. For instance, in a scenario where tenants are segregated at the database level rather than the user level, costs could be split per database instead. When using Amazon RDS for Oracle Multitenant you could also choose to split by PDB(Pluggable Databases).

In the following sections, we walk you through how to set up this solution and interpret the results, empowering you to take control of your multi-tenant Amazon RDS costs and performance.

Prerequisites

Before you begin, make sure you have met the following prerequisites:

You have an active AWS account. Otherwise, you can create a new one.
You have one or more RDS instances running engines that support Performance Insights, preferably with multiple tenants or applications sharing the same instance.
Performance Insights is enabled on the RDS instances.
CUR 2.0 is enabled and configured to output to an S3 bucket, with the Resource ID option enabled and with Hourly time granularity.
You have basic familiarity with Amazon RDS, Lambda, Amazon S3, and Athena.

If your CUR is activated in a different AWS account, consider configuring cross-account access permissions for the CUR Glue table.

The solution considers compute costs only; other costs associated with your instance, such as data transfer, storage, and so on, are excluded. This is a current limitation of the proposed solution.

Deploy the solution

First, deploy the AWS CloudFormation templates from the provided AWS Cloud Development Kit (AWS CDK) application. The templates create the following resources:

The Lambda function that collects metrics
An S3 bucket to store data
The Athena views to perform analysis
An Amazon EventBridge rule to invoke the Lambda function every hour

To deploy the templates, run the deployment instructions provided in the GitHub repository README.md, under the Setup Projects and Deploy Stacks sections.

We built this solution using AWS Serverless services, with costs scaling based on your usage. To estimate costs for data analysis, we recommend reviewing the pricing pages for Amazon Athena and Amazon QuickSight.

While deploying, you will be required to provide two parameters:

glueCURDBName – The AWS Glue database name currently used for the CUR
glueCURDBTable – The AWS Glue database table currently used for the CUR

After the deployment, the Lambda function is configured to run automatically every hour. It continuously collects and processes data as part of its scheduled execution.

Before proceeding with the next steps, you should wait for the next scheduled execution of the Lambda function. This allows sufficient time for the system to accumulate significant data in the S3 bucket.

Analyze Performance Insights metrics

Now, let’s run a comprehensive query to analyze our performance data. We use this query later to join the Performance Insights information related to consumption with the data contained in the CUR. This query will give us detailed insights into database utilization per user and overall utilization. See the following code:

WITH aggregate_load_data AS (
        SELECT 
            timestamp,
            resourcearn,
            AVG(num_vcpus) AS num_vcpus,
            SUM(value) AS total_db_load,
            greatest(AVG(num_vcpus), SUM(value)) total_compute_power,
            count(1) AS num_users
        FROM "AwsDataCatalog"."<glueDBName>"."<piDBTable>" 
        GROUP BY 1, 2
) 
SELECT 
   b.timestamp,
    b.account_id, 
    b.resourcearn,
    b.num_vcpus,
    b."db.user.name"    as user_name, 
    b.value db_load, 
    a.total_db_load,
    a.total_compute_power,
    a.num_users distinct_users,
    case when a.total_db_load = 0 then 0 else  b.value / a.total_db_load end AS perc_utilization,
    (b.value / a.total_compute_power) perc_utilization_rebased
FROM aggregate_load_data a 
JOIN "AwsDataCatalog"."<glueDBName>"."<piDBTable>" b
    ON a.timestamp = b.timestamp AND a.resourcearn = b.resourcearn

You can find this query already deployed by the previous stack as an Athena view. To run it, navigate to the Athena console and look for the pi-data-view view under the AWSDataCatalog data source and the AWS Glue database name (rds-performance-insights-db if you kept the parameter blank).

athena query result

The Athena query results display percentages as decimal values between 0 and 1, with some small numbers represented in scientific notation for precision.

This query provides the following useful information:

It shows the database load (db_load) for each user at a given timestamp
It calculates the total database load (total_db_load) across all users in the hour
It determines the percentage utilization (perc_utilization_rebased) for each user

We will use this data for the following actions:

Identify which users are consuming the most resources
Determine periods of high overall utilization
Spot times when the database is under-utilized, helping you make decisions about potential downsizing or adopting a serverless option

Identifying which users consume more resources is an important consideration in databases used by multiple tenants. In such a scenario, the database service provider often charges tenants based on the resources they consume.

In this solution, we use Performance Insights to provide information split by user, assuming that multi-tenancy is implemented with separate users per tenant. It’s also possible to get information split per database if that aligns better with your multi-tenancy strategy.

Join Performance Insights data with the CUR

In this section, we join the information extracted from Performance Insights with the CUR to match the database consumption with the actual database cost. This allows you to allocate the actual database costs to users in relation to the actual computation used.

We use the data from Performance Insights together with the CUR data, joining them on the fields line_item_resource_id (for RDS instances, this is the database name) and on the timestamp, because the CUR has an hourly granularity.

We also want to focus on RDS instances, so we filter accordingly using the columns line_item_product_code and product_instance_type. We use the inner join because we only want to get data on instances that have Performance Insights enabled.

SELECT
    cur.line_item_usage_start_date                                  as timestamp,
    pi_view.user_name                                               as tenant_id,
    cur.line_item_unblended_cost                                    as database_cost,
    pi_view.perc_utilization_rebased * cur.line_item_unblended_cost as tenant_cost,
    pi_view.total_compute_power,
    pi_view.perc_utilization_rebased                                as perc_utilization_rebased,
    cur.line_item_usage_type,
    cur.line_item_line_item_type,
    cur.line_item_resource_id,
    cur.product['database_engine'] product_database_engine,
    cur.product_instance_type
FROM "AwsDataCatalog"."<glueCURDBName>"."<CURDBTable>" cur
INNER JOIN "AwsDataCatalog"."<glueDBName>"."<piDBView>" pi_view ON 
    cur.line_item_resource_id = pi_view.resourcearn AND
    cur.line_item_usage_start_date = CAST(pi_view.timestamp AS timestamp)
WHERE cur.line_item_product_code = 'AmazonRDS' AND cur.product_instance_type <> ''

The result of this query provides a detailed breakdown of RDS instance costs per tenant for each hour.

The tenant_cost column shows the portion of the hourly cost attributed to each tenant based on their database utilization in the column perc_utilization_rebased.

This view provides additional useful information:

It shows the actual tenant_cost for each user at a given timestamp
It shows for reference the database_cost as it appears in the CUR

We can use this data to correctly allocate the costs related to the DB instance nodes to different users, based on the percentage of usage derived from the db_load value.

You can find this query already deployed by the CloudFormation stacks as an Athena view. To run it, navigate to the Athena console and look for the rds_cost_allocation_view view, as shown in the following screenshot.

athena query result

Discover database under-utilization

From the cost allocation view, you can create an additional view to uncover the amount of costs and usage that can’t be attributed to any specific user, primarily due to database inactivity.

This view can help identify periods of under-utilization and potential cost savings. To achieve this, we create a new view from the rds_cost_allocation_view view to make sure we can have a single row per each hour where we compare the instance cost with the actual usage percentage coming from the db.load metric. The query to define the view is as follows:

SELECT 
    date_trunc('hour' , timestamp)          as timestamp,           -- timestamp (one row per hour)
    max(database_cost)                      as database_cost,       -- actual nodes cost  
    sum(perc_utilization_rebased)           as database_usage,      -- usage percentage within the hour
    1-sum(perc_utilization_rebased)         as unused_percentage,   -- percentage of inactivity
    max(database_cost) - sum(tenant_cost)   as unused_cost          -- cost due to inactivity
FROM "AwsDataCatalog"."<glueDBName>"."<costAllocationViewName>"  
    group by date_trunc('hour' , timestamp)

You can find this query already deployed by the CloudFormation stacks as an Athena view. To run it, navigate to the Athena console and look for the rds_unused_cost_view view, as shown in the following screenshot.

athena query result

With this new view, we now have a row for each hour of database operation. This granular data makes it straightforward to spot periods where the unused_cost is very close to or equal to the database_cost (which is the cost of database nodes obtained from the CUR). This highlights situations where database activity is extremely low or non-existent.

By monitoring and analyzing these periods of low activity and high unused_cost, you can make data-driven decisions to optimize your database operations, reduce costs, and improve overall system efficiency.

Report the costs per tenant

The following screenshot shows a sample visualization that you can build in QuickSight using the two Athena views from the previous section. You can see that this particular database is sitting idle most of the times, with occasional spikes triggered by users (and PostgreSQL administrative tasks that are reported as user rdsadmin).

quicksight report dashboard

If your cost pattern looks like this, you could consider moving your database to Amazon Aurora Serverless to maximize the advantage of paying only for your actual usage.

Also, you are now able to split the Amazon RDS compute costs to different tenants. You can also change the aggregation level in QuickSight to be able to produce a report with different granularity, such as by week, by month, and so on.

Clean up

To avoid incurring in unnecessary charges, delete the resources you created:

Delete the QuickSight data sources and dashboards.
Delete the CloudFormation stacks.
Delete the Athena views and workgroup if you created them separately.

Conclusion

This solution provides insights into your multi-tenant Amazon RDS costs and resource utilization. By using Performance Insights and CUR, you can now achieve the following:

Implement more accurate cost allocation models
Identify resource-intensive tenants
Optimize workloads based on actual usage patterns
Make data-driven decisions for capacity planning and pricing strategies

Consider expanding this solution in the following ways:

Create custom dashboards to visualize your cost data
Integrate with your billing systems for automated chargeback
Set up alerts for when tenants exceed predefined resource thresholds

Start building today with AWS and Amazon RDS, and feel free to contact us by leaving your comments.

About the authors

Davide Coccia is a Technical Account Manager at Amazon Web Services focused on helping customers build, deploy, and run secure, resilient, and cost-effective workloads on AWS. He has a background in analytics consulting. Outside of work, he keeps himself busy with running, playing football (aka soccer) and learning any sort of new things.

Andrea Filippo La Scola is a Partner Solutions Architect at AWS, specializing in data analytics and serverless architectures. He helps AWS partners and customers across Italy design innovative solutions using AWS services.

Select your cookie preferences

AWS Database Blog

Improve cost visibility of an Amazon RDS multi-tenant instance with Performance Insights and Amazon Athena

What is Performance Insights?

Solution overview

Prerequisites

Deploy the solution

Analyze Performance Insights metrics

Join Performance Insights data with the CUR

Discover database under-utilization

Report the costs per tenant

Clean up

Conclusion

About the authors

Resources

Blog Topics

Follow