AWS Database Blog

Optimize Amazon RDS costs for predictable workloads with automated IOPS and throughput scaling

For many applications, traffic and workload follow predictable patterns tied to dates and events. An e-commerce website may see a surge around holidays, whereas a software as a service (SaaS) app may experience an increase after monthly budget refreshes, and these events can be quite demanding in terms of disk IOPS and throughput.
Rather than paying for high Amazon Relational Database Services (Amazon RDS) storage IOPS and throughput for the whole month unnecessarily, you can optimize Amazon RDS costs by auto scaling IOPS and throughput to match these predictable spikes.

In this post, we explain how you can use Amazon RDS IOPS and throughput provisioned settings, automate scaling around monthly and seasonal peaks, and decrease settings during slower weeks. By right-sizing IOPS and throughput levels to your workload’s typical cycles, you can reduce Amazon RDS spend while still getting great performance when you need it most.

Solution overview

Our solution uses an AWS Lambda function to scale storage settings up and down according to a schedule defined in the RDS instance’s tags. The workflow contains the following steps and key components:

  • Amazon EventBridge Scheduler is responsible for running tasks based on a schedule. For this solution, the scheduler calls a Lambda function one time per day.
  • A Lambda function written in Python scans RDS resources in the account, looking for specific tags associated with the instances.
  • When the function finds specific tags, it checks if the current state of IOPS settings matches what is
    expected.
  • The function uses Boto3 calls to Amazon RDS, changing IOPS settings as and when required.
  • Logs are sent to Amazon CloudWatch logs.

The following diagram illustrates the solution architecture.

Solution architecture

Although the examples use an Amazon RDS for MariaDB instance, you can also apply this guidance to RDS for PostgreSQL, RDS for MySQL, RDS for SQL Server, RDS for Oracle or RDS for Db2.

Amazon Aurora uses a completely different storage architecture and is not covered by the solution proposed in this post.

Limitations

The following are some key limitations of Amazon RDS storage operations:

  • Only one change on IOPS and throughput is allowed for every 6-hours or completion of storage optimization, whichever is longer. For this reason, you should select period and performance parameters wisely.
  • Consider using Amazon EBS-optimized instances if disk performance is important for the database, and io2 Block Express if you require more performance and durability than what gp3 storage can provide. Io2 Block Express can achieve up to 256,000 IOPS.
  • For gp3 volumes, provisioned IOPS and throughput change is only allowed for volumes of 200 GB or larger for Oracle, 400 GB or larger for Db2, MariaDB, MySQL, and PostgreSQL, and no storage allocation limit is set for SQL Server. With less than the limit, a fixed baseline of 3,000 IOPS/125MiB/s applies and the setting can’t be changed.
  • Baseline is included in the price, you only pay extra if you provision more capacity.
  • Gp3 storage larger than the limit described above can have provisioned IOPS set in the range 12,000-64,000, and throughput in the range 500-4,000 MiB/s, but you must make sure the instance class supports the amount of IOPS and throughput set, otherwise, the instance class will be the bottleneck.
  • For io2 Block Express, throughput scales proportionally up to 0.256 MiB/s per provisioned IOPS, and up to 4,000 MiB/s.

For more information about these and other limitations, refer to Working with storage for Amazon RDS DB instances.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Create the Lambda function

Complete the following steps to create your Lambda function. For simplicity, the following example covers how to automate IOPS scaling, but you can adapt the code to scale throughput as well.

  1. Create an AWS Identity and Access Management (IAM) Lambda execution role to give the Lambda function permission to change the RDS instance’s storage parameters. The following privileges are needed, including permissions required to log messages in Amazon CloudWatch:
    • rds:DescribeDBInstances
    • rds:ListTagsForResource
    • rds:ModifyDBInstance
    • logs:CreateLogGroup
    • logs:CreateLogStream
    • logs:PutLogEvents

The following is an example IAM policy that provides the required privileges (please note, account ID in the example has to be changed to match your account ID):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "rds:DescribeDBInstances",
                "rds:ListTagsForResource",
                "rds:ModifyDBInstance"
            ],
            "Resource": "arn:aws:rds:*:123456789012:db:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:123456789012:*"
        }
    ]
}
  1. Create a Lambda function in Python and attach the previously created IAM role. Configure the Lambda function to do the following:
    • Use the Boto3 RDS client to call describe_db_instances() to get all RDS instances.
    • Loop through the instances and check if they have the following tags:
      • IOPSHighSetting – The number of IOPS to be set when the instance is in high demand (inside the interval).
      • IOPSLowSetting – The number of IOPS to be set when the instance is in low demand (outside the interval).
      • IOPSHighDayInterval – The interval of days in which the database is in high demand.
        If set as 5-10, the high setting will be set when the automation runs between the fifth and tenth of each month; outside this period, the low setting will be applied.
    • If tags exist, parse the IOPSHighDayInterval tag to get the start and end days of the high IOPS period.
    • Get the current day of the month and check if it falls in the high IOPS period.

The following code works for scaling IOPS:

import boto3
import os
import logging
logger = logging.getLogger()
logger.setLevel("INFO")
from datetime import datetime
rds = boto3.client('rds')

def lambda_handler(event, context):
    logger.info('## ENVIRONMENT VARIABLES')
    logger.info(os.environ['AWS_LAMBDA_LOG_GROUP_NAME'])
    logger.info(os.environ['AWS_LAMBDA_LOG_STREAM_NAME'])
    logger.info('## EVENT')
    logger.info(event)
    # Get current day of month
    today = datetime.today().day
    # Get list of RDS instances
    response = rds.describe_db_instances()
    for db in response['DBInstances']:
        curr_Iops = int(db['Iops'])
        arn = db['DBInstanceArn']  
        tags = rds.list_tags_for_resource(ResourceName=arn)
        # Check if tags exist
        tag_dict = {tag['Key']: tag['Value'] for tag in tags['TagList']}
        logger.info('Processing database {0}'.format(arn))
        if 'IOPSHighSetting' in tag_dict and 'IOPSLowSetting' in tag_dict and 'IOPSHighDayInterval' in tag_dict:
            logger.info('IOPS tags found')
            high_iops = int(tag_dict['IOPSHighSetting']) 
            low_iops = int(tag_dict['IOPSLowSetting'])
            interval_str = tag_dict['IOPSHighDayInterval']
            # Parse interval
            interval_days = [int(x) for x in interval_str.split("-")]
            interval_start = interval_days[0]
            interval_end = interval_days[1]
            # Check if current day is within interval
            if interval_start <= today <= interval_end:
                iops = high_iops
            else:
                iops = low_iops
            if curr_Iops == iops:
                logger.info('No change needed, databases IOPS already set to {0}'.format(str(iops)))
            else:
                logger.info('Changing databases IOPS from {0} to {1}'.format(curr_Iops,str(iops)))
                # Modify IOPS        
                try:
                    rds.modify_db_instance(
                        DBInstanceIdentifier=db['DBInstanceIdentifier'],
                        AllocatedStorage=db['AllocatedStorage'],
                        Iops=iops,
                        ApplyImmediately=True
                    )
                except Exception as error:
                    logger.error('Could not apply change to database {0}: {1}'.format(db['DBInstanceIdentifier'], error))
        else:
            logger.info('No IOPS tags found')
  1. To invoke the Lambda function daily at midnight, create a EventBridge Scheduler rule with a scheduled expression:

cron(0 0 * * ? *)

This triggers the Lambda function every day at 00:00 UTC, but you can choose a more appropriate time for the change. There won’t be any downtime during the operation.

Make sure the Lambda function has enough timeout and memory allocated to complete the required actions. The default of 3 seconds and 128 MB works in a test environment with one RDS instance, but may not work with multiple instances.

Test the Lambda function automation

Complete the following steps to test the automation:

  1. Create an RDS instance with the three tags set (IOPSHighSetting, IOPSLowSetting, IOPSHighDayInterval). Consult Amazon RDS DB instance storage to make sure the settings are in the boundaries of IOPS settings for your instance and storage type, as discussed before.
    Add tags
  2. On the Lambda console, set up a test environment. For the provided code, no special variable is needed in the test event.
    Configure Lambda test eventCreate new Lambda test event
  3. Run the test event and review the output.
    In the previous example, our RDS instance was already set with the expected IOPS setting for today.
  4. Let’s change the tag IOPSHighDayInterval to include today in the high demand interval.Edit tags
  5. If you run the Lambda test event again, you can see the IOPS settings being changed for your instance.

The instance’s status changes to Modifying. The change can take a while to complete.

After Modifying, the status changes to Storage-optimization. No new changes can be made to the storage for at least 6 hours, and while the instance is in any of these statuses, if you change the interval again and run the Lambda function, you get the following error: An error occurred (InvalidParameterCombination) when calling the ModifyDBInstance operation: You can’t currently modify the storage of this DB instance because the previous storage change is being optimized.

For more details about storage optimization, refer to Working with storage for Amazon RDS DB instances.

If this were a real scenario and you had EventBridge set up, the Lambda function would try to run the change again the following day.

Schedule the Lambda function to run daily

To adjust the storage settings according to the scheduler tags, the Lambda function has to be run daily. You can achieve this by creating a scheduler event in EventBridge. Complete the following steps:

  1. On the EventBridge console, under Scheduler in the navigation pane, choose Schedules.
  2. Choose Create schedule.
    Scheduling Lambda execution
  3. Provide the rule details, including the rule name, optional description, and event bus.
  4. For Rule type, select Schedule.
  5. Choose Continue in EventBridge Scheduler.Define EventBridge Scheduler rule details
  6. For Occurrence, select Recurring schedule.
  7. For Schedule type, select Cron-based schedule.
  8. For Cron expression, enter the cron parameters. For this post, we schedule the action to run daily at midnight.Defining scheduler cron expression
  9. Choose a flexible time window of 15 minutes, then choose Next.
  10. For Select target, select Invoke AWS Lambda.
  11. For Lambda function, choose your Lambda function.
  12. Leave Payload blank, then choose Next.Configure EventBridge Scheduler to invoke Lambda
  13. For Action after schedule completion, select NONE.
  14. In the Permissions section, select Create a new role for the schedule and enter a role name, unless you have a specific role created already.
  15. Choose Next.
    Select to create a new role for execution
  16. Confirm everything is as expected and choose Create schedule.

The Lambda function will run as scheduled. To monitor the run, refer to Monitoring Amazon EventBridge.

Clean up

If you followed this guide to create a test environment, you should delete the following resources to avoid ongoing costs:

Conclusion

In this post we outlined a method for automatically adjusting storage performance parameters based on a scheduler set in RDS instance tags. The solution is based on an EventBridge scheduler and a Lambda function written in Python, and can potentially save costs and improve database performance. The solution can scale not only IOPS parameters but also throughput, depending on your requirements.

Leave your thoughts or questions in the comments section.


About the Author

Ivan Schuster is a Senior Database Specialty Architect at AWS. He has over 20 years of experience in the technology industry, mostly with databases. In his role as a professional services consultant, Ivan has supported multiple customers in transitioning their workloads to the AWS Cloud.