AWS Database Blog
Cross-account replication with Amazon DynamoDB
July 2024, this post has been reviewed for accuracy.
Hundreds of thousands of customers use Amazon DynamoDB for mission-critical workloads. In some situations, you may want to migrate your DynamoDB tables into a different AWS account, for example, in the eventuality of a company being acquired by another company. Another use case is adopting a multi-account strategy, in which you have a dependent account and want to replicate production data in DynamoDB to this account for development purposes. Finally, for disaster recovery, you can use DynamoDB global tables to replicate your DynamoDB tables automatically across different AWS Regions, thereby achieving sub-minute Recovery Time and Point Objectives (RTO and RPO). However, you might want to replicate not only to a different Region, but also to another AWS account.
In this post, we cover a cost-effective method to migrate and sync DynamoDB tables across accounts while having no impact on the source table performance and availability.
Overview of solution
We split this article into two main sections: initial migration and ongoing replication. We complete the initial migration by using a feature that allows us to export DynamoDB tables to any Amazon Simple Storage Service (Amazon S3) bucket and then import the data from Amazon S3 into new Amazon DynamoDB table. For ongoing replication, we use Amazon DynamoDB Streams and AWS Lambda to replicate any subsequent INSERTS, UPDATES, and DELETES. The following diagram illustrates this architecture.
Initial migration
The new native export feature leverages the point in time recovery (PITR) capability in DynamoDB and allows us to export a 1.3 TB table in a matter of minutes without consuming any read capacity units (RCUs), which is considerably faster and more cost-effective than what was possible before its release.
Exporting the table with the native export feature
To export the DynamoDB table to a different account using the native export feature, we first need to grant the proper permissions by attaching two AWS Identity and Access Management (IAM) policies: one S3 bucket policy and one identity-based policy on the IAM user who performs the export, both allowing write and list permissions. Additionally, the AWS CloudFormation template from the GitHub repo creates the IAM roles with the required permissions to perform the export of the source Amazon DynamoDB table
The following code is the S3 bucket policy (target account):
The following code is the IAM user policy (source account):
For more information on the export feature, see New – Export Amazon DynamoDB Table Data to Your Data Lake in Amazon S3, No Code Writing Required. When doing the export, you can choose the output format in either DynamoDB JSON or Amazon Ion. In this post, we choose DynamoDB JSON.
The files are exported in the following S3 location:
Importing the table
Now that we have our data exported to the Amazon S3 bucket in the target account, we can use the native import from Amazon S3 feature to import it to a new Amazon DynamoDB table in the target account. However, we must first grant the proper permissions to the user performing the import operation in the target account. Specifically, we need to grant the user ‘GetObject’ and ‘ListBucket’ permissions on the source Amazon S3 bucket, as well as the required Amazon CloudWatch permissions for providing debugging information. Additionally, we need to grant the user ‘ImportTable’, ‘DescribeImport’, ‘ListImports’, and ‘UpdateTimeToLive’ permissions on the DynamoDB table.
The following code is the IAM policy required for import operation:
The AWS CloudFormation template from the GitHub repository will create the IAM roles with the required permissions to perform the import operation.
For more information on the import feature, see Amazon DynamoDB can now import Amazon S3 data into a new table.
Ongoing replication
To ensure data integrity across both tables, the initial (full load) migration should be completed before enabling ongoing replication. In the ongoing replication process, any item-level modifications that happened in the source DynamoDB table during and after the initial migration are captured by DynamoDB Streams. DynamoDB streams store these time-ordered records for 24 hours. Then, a Lambda function reads records from the stream and replicates those changes to the target DynamoDB table. The following diagram (option 1) depicts the ongoing replication architecture.
However, if the initial migration takes more than 24 hours, we have to use Amazon Kinesis Data Streams instead, thereby extending the retention from 1 day to 365 days. The following diagram (option 2) depicts the ongoing replication architecture if we use Kinesis Data Streams as a buffer.
All updates happening on the source table can be automatically copied to a Kinesis data stream using the new Kinesis Data Streams for DynamoDB feature. A Lambda function reads records from the stream and replicates those changes to the target DynamoDB table.
In this post, we use DynamoDB Streams (option 1) to capture the changes on the source table.
Deploying and running the solution
Follow the instructions in the GitHub repo to deploy the AWS CloudFormation template in both the source and target AWS accounts. The template will deploy all necessary resources, including AWS IAM Roles, AWS Lambda function, Amazon S3 Buckets, and Amazon SQS queues, to facilitate the initial load and ongoing replication. Once the resources are successfully created in both accounts, execute the `src/dynamodb-initial-load-and-cdc-setup.py` script in your terminal to start the initial migration and set up ongoing replication. The script will perform the following actions:
- Enable DynamoDB streams on the source table, if not already enabled.
- Export the source DynamoDB table to an S3 bucket in the target account.
- Import data from the S3 bucket to the new DynamoDB table in the target account. If the source table has global secondary indexes (GSI) and local secondary indexes (LSI), the script will create them as well. It will also set up a time-to-live (TTL) attribute if it was enabled on the source table.
- Once the initial migration is complete, it will create an event source mapping with the DynamoDB stream as a trigger to the AWS Lambda function that will handle ongoing replication.
After the src/dynamodb-initial-load-and-cdc-setup.py
script finishes, change the write capacity of the target Amazon DynamoDB table to match the write capacity of the source table in the source account.
For detailed instructions and additional information, please refer to the README file.
Verifying the number of records in the source and target tables
To verify the number of records in both the source and target tables, check the Item summary section on the DynamoDB console, as shown in the following screenshot.
You can also use the following command to determine the number of items as well as the size of the source and target tables:
DynamoDB updates the size and item count values approximately every 6 hours.
Alternatively, you can run an item count on the DynamoDB console. This operation does a full scan on the table to retrieve the current size and item count, and therefore it’s not recommended to run this action on large tables.
Cleaning up
Delete the resources you created if you no longer need them:
- Disable DynamoDB Streams on the source table.
- Delete the event source mapping for the AWS Lambda function.
- Disable Point-In-Time Recovery (PITR) on the source table.
- Delete the AWS CloudFormation stacks in both the source and target AWS accounts.
Conclusion
In this post, we showcased the fastest and most cost-effective way to migrate DynamoDB tables between AWS accounts, using the new Amazon DynamoDB export/import feature, and AWS Lambda in conjunction with DynamoDB Streams for the ongoing replication. Should you have any questions or suggestions, feel free to reach out to us on GitHub and we can take the conversation further. Until next time, enjoy your cloud journey!
About the Authors
Ahmed Zamzam is a Solutions Architect with Amazon Web Services. He supports SMB customers in the UK in their digital transformation and their cloud journey to AWS, and specializes in Data Analytics. Outside of work, he loves traveling, hiking, and cycling.
Dragos Pisaroc is a Solutions Architect supporting SMB customers in the UK in their cloud journey, and has a special interest in big data and analytics. Outside of work, he loves playing the keyboard and drums, as well as studying psychology and philosophy.
Reviewed and updated by
Rishi Jala is a NoSQL Data Architect with AWS Professional Services. He focuses on architecting and building highly scalable applications using NoSQL databases such as Amazon DynamoDB. Passionate about solving customer problems, he delivers tailored solutions to drive success in the digital landscape.
Corey Cole is a Senior Cloud Infrastructure Architect. He specializes in assisting customers in operating at scale in the cloud. Outside of work, he collects and modifies mechanical watches.
Shyam Arjarapu is a Principal Data Architect at Amazon Web Services and leads Worldwide Professional Services for Amazon DocumentDB and Amazon Timestream. Shyam is passionate about solving customer problems and channels his energy into helping AWS professional service clients build highly scalable applications using Amazon Purpose-built databases. Before joining AWS, Shyam held a similar role at MongoDB and worked as Enterprise Solution Architect at JP Morgan & Chase, SunGard Financial Systems, and John Hancock Financial.