AWS Database Blog

Upgrade your Amazon DocumentDB clusters using AWS DMS

In the fast-paced digital landscape, businesses rely heavily on their data store to manage critical information, making their seamless functioning indispensable. As the technology evolves, applications become mission critical, and so do the databases that power these enterprises. This necessitates periodic upgrades to maintain optimal performance and security. Although upgrades offer enhanced capabilities, the challenge lies in ensuring uninterrupted operations during the process.

In this post, we explore the Amazon DocumentDB (with MongoDB compatibility) version upgrade paths, focusing on the essential considerations when upgrading Amazon DocumentDB 3.6/4.0 to 5.0 using AWS Database Migration Service (AWS DMS).

Upgrade paths

Amazon DocumentDB provides a range of upgrade paths that address diverse needs and scenarios, offering flexibility and options to ensure an efficient upgrade process.

As of this writing, you can upgrade your Amazon DocumentDB cluster using command line utilities like mongodump and mongorestore. The mongodump utility creates a binary backup of the database. You can then use the mongorestore utility to restore that backup to a new Amazon DocumentDB 5.0 cluster. Refer to Dumping, Restoring, Importing, and Exporting Data for additional details. The backup and restore time depends on the size of your dataset, which could potentially create longer downtime if you need to upgrade large clusters. This method is ideal for lower environments (dev and test) and for small to medium datasets when a large enough maintenance window exists for the migration.

Alternately, Amazon DocumentDB provides data portability and supports migration through AWS DMS. AWS DMS is a managed migration service that helps move your database workloads to and within AWS, with minimum downtime.

AWS DMS makes it straightforward to migrate earlier Amazon DocumentDB versions, relational databases, and non-relational databases to your target Amazon DocumentDB cluster. With AWS DMS, you don’t need to install agents or alter your source database; you can manage and configure it on the AWS Management Console. AWS DMS is a managed service and can be used to perform the following actions:

  • Migrate existing data from a source database to a target database
  • Replicate ongoing database changes

Prerequisites

We assume that you have the following prerequisites:

  • Background knowledge regarding Amazon DocumentDB and AWS DMS
  • The appropriate permissions to interact with resources in your AWS account

This solution involves setting up and using AWS resources, so it will incur costs in your account. Refer to AWS Pricing for more information. We strongly recommend that you set this up in a non-production instance and run end-to-end validations before you implement this solution in a production environment.

Discovery

When upgrading to an Amazon DocumentDB cluster of a higher version, check for any deprecated features and operators or changes in usage methods. Run the application against the newer version and make sure that the behavior and performance is the same as in previous versions, unless there are intentional modifications in the application.

Target cluster considerations

As a best practice for migrations, we recommend sizing up the target cluster during the migration itself (to speed it up) and then sizing it down for the validation process. The extent of scaling up and how much impact that has on the overall migration time needs to be validated during the trials done before the actual migration.

Upgrade with AWS DMS

Upgrades require a proven mechanism to accurately sync data and cut over to the target cluster with minimal time. AWS DMS enables this with two key components:

  • Full load – You can replicate your entire cluster or specific collections using AWS DMS. To achieve faster migration time, the full load can synchronize multiple collections simultaneously and parallelize the load process by using the AWS DMS segmentation functionality.
  • Change data capture – After the initial data load, you need to keep the source and target clusters in sync until the actual cutover. AWS DMS change data capture (CDC) uses the Amazon DocumentDB change stream to ensure all changes are mirrored to the target Amazon DocumentDB cluster.

Enable the change stream on the source cluster

To perform an upgrade, AWS DMS requires access to the source cluster’s change stream to provide a time-ordered sequence of update events that occur within your cluster’s collections. Change streams enable AWS DMS to perform CDC and apply incremental changes to the target Amazon DocumentDB cluster. Run the following command on the source cluster, replacing the database and collection names accordingly:

Db.adminCommand({modifyChangeStreams: 1,
    database :” <DBName>“,
    collection:” <Collection Name>”,
    enable: true});

Repeat this for each collection you want to replicate.

Update change stream retention duration

Based on your transactions, network connectivity, and frequency of changes, you can set the log retention duration appropriately. The change stream log retention duration can be set to a value between 1 hour and 7 days. The default value is 3 hours. For example, if you expect your Amazon DocumentDB cluster migration using AWS DMS to take 12 hours, you should set the change stream retention to 18–24 hours, to account for any other operational overhead. For simplicity, you can pick the maximum retention duration. Setting the log retention to a lower value can result in missed transactions in the target DocumentDB cluster.

You can set the change stream retention by modifying the value of the change_stream_log_retention_duration parameter in the cluster parameter group that is associated with your cluster. If you’re using the AWS Command Line Interface (AWS CLI), run the following command:

aws docdb modify-db-cluster-parameter-group \
--db-cluster-parameter-group-name sample-parameter-group \
--parameters "ParameterName=change_stream_log_retention_duration,
 \ ParameterValue=<parameter-value>,ApplyMethod=immediate"

Note that I/O and storage cost is associated with change streams. For more details, refer to Amazon DocumentDB (with MongoDB compatibility) pricing. Therefore, make sure to turn off change streams in the source cluster post-migration.

Create indexes in the target cluster

You can use the Amazon DocumentDB Index Tool to create your indexes in your target cluster to match the ones in your source cluster. Indexes should be created before the data is loaded. Generating an index after a data load will result in longer processing time.

First, dump indexes from your source cluster using the following command:

python migrationtools/documentdb_index_tool.py --dump-indexes --uri \ 
mongodb://sample-user:\ user-password@sample-source-cluster.node.us-east 1.docdb.amazonaws.com:27017/ \
?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false'
\ --dir ~/index.js/ \

After your indexes are successfully exported, you can restore those indexes in your target cluster:

python migrationtools/documentdb_index_tool.py --restore-indexes --uri \
mongodb://sample-user:user-password@sample-destination-cluster.node.us-east 1.docdb.amazonaws.com:27017/ \
?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false' \
--dir ~/index.js/

To confirm that you have restored the indexes correctly, you can run the following command in the mongo shell, replacing the database and collection names accordingly:

<DBName>.<Collection Name>.getIndexes()

Create a replication instance

The AWS DMS replication engine is the core software that runs on your replication instance and performs the migration task. The engine version should be 3.5.1 or later.

AWS DMS always creates the replication instance in a VPC. You specify the VPC where your replication instance is located. You can use your default VPC for your account and Region, or you can create a new VPC. AWS DMS replication instances are typically placed in the target VPC rather than the source VPC for performance, reliability, and security reasons. Make sure that the elastic network interface allocated for your replication instance’s VPC is associated with a security group. Also, make sure this security group’s rules let all traffic on all ports leave (egress) the VPC.

To make sure that your replication instance has enough resources to perform the migration, check your instance’s use of CPU, memory, swap files, and IOPS. For more information on monitoring, see AWS Database Migration Service metrics.

To create your replication instance, complete the following steps. For full instructions, refer to Create a AWS DMS Replication Instance.

  1. On the AWS DMS console, choose Replication instances in the navigation pane.
  2. Create a new replication instance.
  3. Provide a name, optional ARN, and optional description.
  4. Choose your instance class and engine version.

Figure 1

Configure AWS DMS endpoints

An AWS DMS task is where the actual migration happens. You can specify entire collections or schemas, or select the ones you prefer. AWS DMS uses an endpoint to access the source and target databases and migrate data from the source to the target endpoint. Complete the following steps to create a source endpoint and target endpoint in the replication instance:

  1. On the AWS DMS console, under Migrate data in the navigation pane, choose Endpoints.
  2. Choose Create endpoint

Figure 2:

  1. For Endpoint type¸ select Source endpoint.
  2. For Endpoint identifier, enter a name for your endpoint.
  3. For Source engine, choose Amazon DocumentDB (with MongoDB compatibility).
  4. For Access to endpoint database, select Provide access information manually and enter the source server name, port, user name, and password.

Figure 3:

  1. If you have TLS enabled in your DocumentDB cluster, configure The type of Secure Socket Layer enforcement to verify-full and add the CA certificate that is associated with your cluster.

You can get that information from the Connectivity & security tab of your Amazon DocumentDB cluster.

Figure 4:

  1. Expand the Test endpoint connection section.
  2. Specify your VPC and replication instance.
  3. Choose Run test.

Figure 5:

  1. When the test completes successfully, choose Create endpoint.
  2. Repeat these steps to create the target endpoint (select Target endpoint for Endpoint type).

Create an AWS DMS migration task

Migration tasks tell the replication instance what data needs to be copied over to the target cluster. To create your AWS DMS migration task, complete the following steps:

  1. On the AWS DMS console, under Migrate data in the navigation pane, choose Database migration tasks.
  2. Choose Create task.

Figure 6:

  1. Enter a task identifier.
  2. Choose the replication instance you created.
  3. Enter the source and target endpoints.
  4. For Migration type, choose Migrate existing data and replicate ongoing changes.

Figure 7:

  1. For Task setting, select Drop tables on target.
  2. For Task logs, select Turn on CloudWatch logs.
  3. In the Table mappings section, choose Add new selection rule.
  4. For Source name, enter the name for your database.
  5. For Source table name, enter the name of your collection.

You can either use specific names or wildcard matching to select multiple collections and databases.

Figure 8:

  1. Leave Enable premigration assessment unchecked.

For large tables, consider splitting them into separate tasks and using segmentation. You can use the AWS DMS Segment Analyzer to generate manual segmentation boundaries.

  1. Choose Create task.

After a few minutes, you should see your task running in the state “Load complete, replication ongoing.” To check the progress of data migration, check the table’s Statistics tab in the migration task.

Monitoring

You can use Amazon CloudWatch logs to monitor the progress of your AWS DMS task and related resource usage. Metrics like CPU, memory, and IOPS usage play a critical role in a successful migration. Refer to AWS DMS key troubleshooting metrics and performance enhancers for additional details.

Validation

To perform data validation, you can use the Amazon DocumentDB DataDiffer Tool, which offers checks for both data count and content. Keep in mind that this process may be time-consuming, so it’s important to allocate sufficient time during the planning phase. Alternatively, if you have a customer data validation application available, you may consider using it instead. The DataDiffer tool is provided as an example.

If you have a mismatch between the source and the target, you can enable auditing. However, you should be aware of cost and performance trade-offs.

Consider enabling audit logging on both the source and target clusters in order to detect data mismatches and isolate operations that aren’t functioning as expected. It’s important to note that enabling audit logging can be resource-intensive. Therefore, it should be used judiciously in production environments, only when absolutely necessary, to minimize any adverse performance impact. Enabling auditing on a cluster is a two-step process:

  1. Modify the audit_logs parameter in the parameter group to all. The audit_logs parameter is a comma-delimited list of events to log.
  2. Enable Amazon DocumentDB to export logs to CloudWatch. The following code modifies the cluster sample-cluster and enables CloudWatch auditing:
aws docdb modify-db-cluster \
--db-cluster-identifier sample-cluster \
--cloudwatch-logs-export-configuration '{"EnableLogTypes":["audit"]}'

Clean up

To ensure efficient resource management, don’t forget to delete any unused AWS DMS instances and the source cluster. After completing the migration process, disable auditing if you enabled it and set your change stream retention back to its prior value. Disable change streams if they weren’t enabled on the source prior to the migration.

Summary

In this post, we demonstrated how to upgrade Amazon DocumentDB 3.6/4.0 to 5.0 with minimal downtime. This method lets you avoid rollbacks because you can validate data before pointing your application to the new cluster. Additionally, you should make sure that dependencies and impacts are identified early, assumptions are confirmed, and risks are clarified through a proof of concept, and you should have a solid plan for the upgrade and application launch.

If you have any comments or questions about this post, submit them in the comments section


About the Authors

Image1 Murali Sankar is a Senior Solutions Architect at AWS based out of New York. He is tech enthusiast, passionate about solving complex business problems, work with organizations to design and develop architectures on AWS, and support their journey on the cloud. Throughout his career, Murali has been dedicated to assisting customers with database-related challenges, including migration, design, and performance optimization.

Cheryl Joseph is a Senior Solutions Architect at AWS. She has a passion for solving technical challenges. She loves guiding and helping others in understanding their potential in pursuing careers as tech professionals. She enjoys public speaking. She has over 20 years of strong experience in leading several complex, highly robust, and massively scalable software solutions for large-scale enterprise applications.

Sourav Biswas is a Senior DocumentDB Specialist Solutions Architect at Amazon Web Services (AWS). He has been helping AWS DocumentDB customers successfully adopt the service and implement best practices around it. Before joining AWS, he worked extensively as an application developer and solutions architect for various noSQL vendors.