AWS Database Blog
How to migrate BLOB and CLOB tables from Oracle to PostgreSQL using Ora2pg and AWS DMS
Many enterprises are considering migrating their Oracle databases to PostgreSQL because of the close compatibility between the platforms and the enterprises’ desire to reduce licensing costs. Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility make it easy to set up, operate, and scale PostgreSQL deployments in the cloud in a cost-efficient manner.
AWS offers two services to migrate Oracle databases running in either the AWS environment or your on-premises data centers. AWS Schema Conversion Tool (AWS SCT) helps you convert your existing database schema to the target database schema. If AWS SCT can’t convert a schema automatically, it gives an assessment report telling how you can create an equivalent schema in the target database. AWS Database Migration Service (AWS DMS) helps you migrate the actual data from source database to target database.
At times, you might want to migrate large BLOBs and CLOBs outside of AWS DMS. If you are already familiar with the open-source tool Ora2Pg, you can use this tool to bulk load your data and then use AWS DMS for change data capture (CDC).
In this blog post, we discuss how to migrate your Amazon RDS for Oracle database to Amazon RDS for PostgreSQL. We do this by using the open source Ora2Pg tool for the initial load of tables with CLOB column data and using the CDC feature of AWS DMS for ongoing replication. Using Ora2Pg and AWS DMS together accomplishes the migration and sets the stage for ongoing replication.
To test the migration of data from Amazon RDS for Oracle to Amazon RDS for PostgreSQL, we provide an Amazon RDS for Oracle snapshot for you to use. This snapshot contains one schema and user, both with the name awsorauser
, and five tables. Two tables in this schema (CUSTOMER
and PART
) each have a CLOB column. Each CLOB column has XML content, and the size of the CLOB column in each row is approximately 5 MB. The two tables have 5,000 rows each. In addition, the SUPPLIER
table has 110,000 rows but doesn’t have any CLOB columns. The other two tables, NATION
and REGION
, are very small.
Migration overview
At a high level, we do the following in this post:
- Create a virtual private cloud (VPC) based on Amazon VPC with two subnets and a security group.
- Provision an Amazon RDS for Oracle database from a publicly available snapshot.
- Provision and configure an Amazon EC2 instance to load the test data into an Amazon RDS for Oracle database.
- Provision an Amazon RDS for PostgreSQL database.
- Enable archive log mode, supplemental logging, and automatic backups on the source Oracle database.
- Provision an Amazon EC2 instance for installing the Ora2Pg tool.
- Install and configure the Ora2Pg tool.
- Load new data into the source Oracle database to test change data capture (CDC).
- Create AWS DMS tasks with CDC to enable replication from the source RDS for Oracle database to the target RDS for PostgreSQL database.
Prerequisites and assumptions
To perform this exercise, you need the following:
- An Oracle account to download the Oracle client packages.
- An AWS account that provides access to AWS services.
- An AWS Identity and Access Management (IAM) user with an access key and secret access key to configure the AWS CLI.
- For testing purposes, the services that this post works with need to be created in the
us-east-1a
Availability Zone in theus-east-1
Region.
Additionally, be aware of the following:
- We configure all services in the same VPC to simplify networking considerations.
- The predefined database schema name and the password in the source RDS for Oracle database are both
awsorauser
. - The predefined root user name for the source RDS for Oracle database is
admin
. - The predefined root password for the source RDS for Oracle database is
admin123
. - While creating the RDS for PostgreSQL database, use lowercase letters for the database name, master user name, and schema names.
- Important: The AWS CloudFormation templates that we provide use hard-coded user names and passwords and open security groups. These are just for testing purposes and aren’t intended for production use without any modifications.
AWS components that are used in the migration
You need the following AWS components for successful testing of this migration.
Component | Purpose | |
1 | Amazon RDS for Oracle instance | The source Oracle DB instance to restore from a provided snapshot. |
2 | Amazon RDS for PostgreSQL instance | The target PostgreSQL instance. Our source Oracle database is migrated to this RDS for PostgreSQL instance. |
3 | First Amazon EC2 machine (m4.xlarge), tag name Ora2pg-blog-EC2InstanceFor-SourceDataLoading |
This EC2 machine is used to add new data into the source Oracle database while the initial migration is running from the source RDS for Oracle database to the target PostgreSQL database. This machine uses Oracle client version 12.2 because the Oracle SQL Loader tool sqlldr comes with this version and is needed for adding new data into source RDS for Oracle database. |
4 | Second Amazon EC2 machine (m4.4xlarge), tag name Ora2pg-blog-EC2InstanceFor-Ora2pg-Configuration |
This EC2 machine is used to move the data from RDS for Oracle database to RDS for PostgreSQL database using Ora2pg. We install and configure Ora2pg on this EC2 instance. We also need to install 11.2 version of Oracle client on this machine instead of 12.2 version. Thus, we need two separate EC2 instances. |
5 | AWS DMS instance | This DMS instance is needed to enable ongoing replication from the source RDS for Oracle database to the target RDS for PostgreSQL database. |
Implementing the migration
In this section, we describe how to implement the migration.
1. Use the AWS CloudFormation template for initial setup
In this step, we set up a VPC, two subnets, and a security group. Restore the source RDS for Oracle database from the snapshot. Create the target RDS for PostgreSQL database. Create two EC2 instances, one to use for loading the test data and one to use for configuring the Ora2pg tool and software.
You can use this downloadable CloudFormation template to set up the VPC security group, source RDS for Oracle database, target RDS for PostgreSQL database, and two EC2 instances. To launch directly through the console, click on the below Launch Stack button:
The CloudFormation template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.
For this parameter | Use this | Predefined or manual entry? |
Stack name | Provide a unique stack name, for example ora2pg-dms-stack-1 |
Manual entry |
OracleDatabaseName | AWSORA | Predefined |
OracleInstanceType | db.m4.xlarge | Predefined |
DBSnapshotIdentifier | aws-blogs-oracle2postgres-migration-using-ora2pg-dms-source-data |
Predefined |
OracleDBAllocatedStorage | 500 GB | Predefined |
OracleDBInstanceIdentifier | awsorains |
Predefined |
PostgreSQLDBInstanceIdentifier | awspgins |
Predefined |
PostgreSQLDBInstanceClass | db.m4.xlarge | Predefined |
PostgreSQLDBName | awspgdb |
Predefined |
PostgreSQLDBUserName | Root |
Predefined |
PostgreSQLDBPassword | Rootroot123 |
Predefined |
PostgreSQLDBAllocatedStorage | 500 GB | Predefined |
KeyName | Provide the name of the existing EC2 key pair | Manual entry |
ClientIP | Provide the IP address range to use to connect to the RDS and EC2 instances from your local machine. This must be a valid IP CIDR range of the form x.x.x.x/x. Get your address using checkip.amazonaws.com or whatsmyip.org. | Manual entry |
DataLoadingIntoOracleEC2InstanceType | m4.xlarge | Predefined |
Ora2PGEC2InstanceType | m4.4xlarge | Predefined |
After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.
The output from the preceding template is as follows.
Key | Value |
StackName | Name |
VPCID | Vpc-xxxxxxx |
Regionname | AWS Region—make sure that it shows us-east-1 |
SourceRDSOracleEndPoint | Source RDS for Oracle endpoint |
TargetRDSPostgreSQLEndPoint | Target RDS for PostgreSQL endpoint |
EC2InstanceForOra2PgInstanceId | Instance ID of the second EC2 instance, where we install and configure the Ora2pg tool |
EC2InstanceForOra2PgAvailabilityZone | Availability Zone where this EC2 instance is created |
EC2InstanceForOra2PgPublicDNS | Public DNS address of the second EC2 instance, where we install and configure the Ora2pg tool |
EC2InstanceForSourceDataLoadingInstanceId | Instance ID of the first EC2 instance. We use this instance to load the test data into the source RDS for Oracle database, to simulate the data loading while the initial load is running with Ora2pg. |
EC2InstanceForSourceDataLoadingPublicDNS | Public DNS address of the first EC2 instance |
EC2InstanceForSourceDataLoadingAZ | us-east-1a—Availability Zone where this EC2 instance must be created. Select the corresponding subnet associated with your VPC. |
SubnetID1A | ID of subnet created in Availability Zone 1A |
SubnetID1B | ID of subnet created in Availability Zone 1B |
It takes approximately 15–16 minutes for the deployment to complete. When the stack launch is complete, it returns the output with information about the resources that were created. Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
2. Configure the first EC2 instance
Next, configure the first EC2 instance for loading test data into Amazon RDS for Oracle database. This EC2 instance has the tag name Ora2pg-blog-EC2InstanceFor-SourceDataLoading
. For this configuration, do the following:
- Download the Oracle instant client .rpm files from the Instant Client Downloads for Linux x86-64 (64-bit) page on the Oracle website to the first EC2 instance that was created using the CloudFormation template. The download process requires you to log in to your own OTN network and accept the license agreement. The files to download are
oracle-instantclient12.2-tools-12.2.0.1.0-1.x86_64.rpm
,oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
, andoracle-instantclient12.2-sqlplus-12.2.0.1.0-1.x86_64.rpm
. Make sure that you are downloading the 12.2 version of Oracle client .rpm files in this first EC2 instance, because the 12.2 version client comes with the Oracle SQL Loader tool, sqlldr. - Copy the .rpm files into the /tmp directory of the first Amazon EC2 instance (the instance with the tag Name
Ora2pg-blog-EC2InstanceFor-SourceDataLoading
). This instance is for loading the test data. You can get the EC2 instance’s public DNS value by referring to the CloudFormation template output table’s keyEC2InstanceForSourceDataLoadingPublicDNS
. In some cases, you might have trouble connecting to the cluster using Secure Shell (SSH). If so, check the cluster’s security group to make sure that it allows inbound SSH connection (through TCP port 22) from your client’s IP address. - Follow the instructions listed in the AWS CLI Getting Started topic to configure the AWS CLI using the
aws configure
The user for this configuration needs to be set asec2-user
. - Download the wrapper script from the Amazon S3 bucket
setting_oracle_client_host.sh
. As the userec2-user
, run the following command to download the scripts to configure. - Execute the script using the following command.
Provide the Amazon RDS for Oracle endpoint that was created preceding, without the port number and colon. Also provide the Availability Zone as
us-east-1a
and the database name that was selected while restoring the RDS for Oracle snapshot. Following is an example:When you execute the wrapper script, it also downloads the additional scripts that are required to create tables, sqlldr control files, and so on. In addition, it attaches an Amazon EBS volume and is mounted on /local/mnt6. The EBS volume contains the necessary CLOB files that are used for loading some rows into the source RDS for Oracle database while the Ora2pg job is running. This script also enables archive log mode and supplemental logging on the source RDS for Oracle database. Note: When the script execution is complete, it mounts a disk with a volume 25 GB in size. This disk contains the necessary files to generate the dataset for loading the test data. You can check this by running the command
df –h
. When you clean up at the end of the exercise, you manually delete this EBS volume.
3. Configure the second EC2 instance
Next, install and configure the Ora2pg tool on the second Amazon EC2 instance. This EC2 instance has the tag name Ora2pg-blog-EC2InstanceFor-Ora2pg-Configuration
.
To install Ora2pg, we need to configure the Oracle instant client, Comprehensive Perl Archive Network (CPAN), Perl-DBI module, Perl-YAML module, and the PostgreSQL client (for testing the psql connection). To set these up, we provide scripts. These scripts also create a schema, a user, and a role in the PostgreSQL database.
Download the Oracle instant client .rpm files from rhw Instant Client Downloads for Linux x86-64 (64-bit) page. The download process requires you to log in to your own OTN network. The files to download are oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64.rpm
, oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64.rpm
, and oracle-instantclient11.2-devel-11.2.0.4.0-1.x86_64.rpm
. Make sure that you are downloading the 11.2.0.4 version .rpm files.
For this configuration, do the following:
- Copy the Oracle client .rpm files to the /tmp directory of the second Amazon EC2 instance that was created using the CloudFormation template. You can get the second EC2 instance’s public DNS value by referring to the preceding template output table’s key
EC2InstanceForOra2PgPublicDNS
. - Follow the instructions listed in AWS CLI Getting Started to configure the AWS CLI using the
aws configure
The user for this configuration needs to be set asec2-user
. - Download the Ora2pg configuration scripts from Amazon S3. The script names are
setting_ora2pg_host_step1.sh
andsetting_ora2pg_host_step2.sh
. Thesetting_ora2pg_host_step1.sh
. The setting_ora2pg_host_step1.sh script installs Oracle 11.2 client .rpm files, sets up Oracle environment variables, and installs the PERL-CPAN repository, Perl Database Interface (PERL-DBI), Perl-Yaml module, PostgreSQL client, and Perl DBD::Pg module. These are required for successful configuration of Ora2pg software. Thesetting_ora2pg_host_step2.sh
script downloads and installs Ora2pg version 18.1 from GitHub. It then checks the database connections to source and target, creates Ora2pg configuration files for each table, and generates the target schema structure. It creates a user and a role, and it creates a schema structure in the target RDS for PostgreSQL database. It then gets the latest Oracle system change number (SCN) and converts it into Unix time in milliseconds. Unix time is used for the CDC start time value when you create DMS tasks. Finally, the script initiates the copy of the source RDS for Oracle table’s data into the target RDS for PostgreSQL database.
As theec2-user
user, run the following commands to download two scripts to configure the Ora2pg tool. - Execute the first script using the following command, passing these parameters to the script:
- The RDS for Oracle database endpoint that was created preceding. Get this value from the output section of the first CloudFormation template.
- The Amazon RDS for Oracle DB name that was selected as part of the RDS for Oracle database restore.
Following is an example:
Modify
awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
to be the correct RDS for Oracle endpoint. - Get the
CPAN Test::More
,DBD::Pg
, andDBD::Oracle
modules and install and compile them as described following. TheDBD::Oracle
andDBD::Pg
modules are Perl modules that work with the database interface (DBI) to provide access to the source RDS for Oracle instance and target RDS for PostgreSQL instance. You needTest::More
for successful installation of Oracle and Pg modules. To install and compile these modules, run the following commands:- Run this command at the Unix shell prompt:
cpan
. The first time that you run thecpan
command, you’re prompted to answer a few questions. Just accept the default values by pressing the Enter key on your keyboard. Whencpan
is set up, it takes you to the cpan shell. At the cpan shell, run this command:This command installs the
Test::More
module, needed for successful installation of theDBD::Pg
andDBD::Oracle
After this command runs successfully, it exits from the cpan shell and takes you to the Unix shell prompt. At that prompt, run the following commands to build the module:When these commands are complete, exit from the shell by running the
exit
command as many times as needed, until you fully exit the cpan shell. - Install the
DBD::Pg
module using the following commands at the cpan prompt:- Log in to the cpan shell using the
cpan
command. - Run this command at the cpan prompt:
look DBD::Pg
After you run this command, you return to the Unix shell prompt. - At that prompt, run this command:
perl Makefile.PL
- Then run this command:
sudo make
After running this command, you might see warning messages like this:Ignore these messages.
- Run this command:
sudo make install
- When this command is complete, exit the shell by running the
exit
command.
- Log in to the cpan shell using the
- Install the
DBD::Oracle
module using the following commands at the cpan prompt:- Log in to the cpan shell using the
cpan
command - Run this command at the cpan prompt:
look DBD::Oracle
After you run this command, you return to the Unix shell prompt. - At that prompt, run this command:
perl Makefile.PL
- Then run this command:
sudo make
After running this command, you might see some warning messages. Ignore these messages. - Run this command:
sudo make install
- When this command is complete, exit the shell by running the
exit
command as many times as needed, until you fully exit the cpan shell.
- Log in to the cpan shell using the
- Execute the second script as described following. Make sure that you have completed the first script before proceeding to the second script. Start by running the following command.
The parameters to pass to this script are described in the following table.
Script parameter Value Comments 1 <RDS_ORACLE_END_POINT> awsorains.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
Endpoint for the source RDS for Oracle database. Change this value to your appropriate endpoint. 2 <ORA_DB_NAME> AWSORA
Database name in the source RDS for Oracle database. This value was provided as part of the database restore. 3 <PG_RDS_END_POINT> awspgins.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
Target RDS for PostgreSQL endpoint. Change this value to your appropriate endpoint. 4 <PG_MASTER_USERNAME> root
Admin user name that was selected as part of the RDS for PostgreSQL database creation. 5 <PG_MASTER_PASSWORD> Rootroot123
Admin user password that was selected as part of the RDS for PostgreSQL database creation. 6 <PG_DATABASE_NAME> awspgdb
Database name that was selected as part of the RDS for PostgreSQL database creation. 7 <PG_MIGRATION_USER> miguser
User name that is created in the PostgreSQL database and is used for data migration. 8 <PG_MIGRATION_PASSWORD> migpasswd
Password for migration user. 9 <PG_MIGRATION_SCHEMA> awspgschema
Name of the schema that is created in PostgreSQL database. All tables are migrated to this schema. Make sure that you use lowercase letters for the schema name. 10 <PG_CONTROL_SCHEMA> awspgctrlschema
Name of the control schema that is created in PostgreSQL database. This schema name is used by the control tables that are created as part of the AWS DMS setup. Make sure that you use lowercase letters for the control schema name. Following is an example:
When you execute the script, at the end it executes the Ora2pg jobs in
nohup
mode (that is, they execute in the background). To exit the program, just press the Enter key on your keyboard.Note: The second script creates a schema and required tables in the target RDS for PostgreSQL database. It also starts the data copy from the source RDS for Oracle database to the target RDS for PostgreSQL database. It gets the latest SCN number and the time stamp from the RDS for Oracle database before copying the data using Ora2pg. This information is needed when starting the DMS tasks. This information is saved in a file with the name
scn_information_from_source.out
in the home directory of theec2-user
user on this EC2 instance—that is, the instance with the tag nameOra2pg-blog-EC2InstanceFor-Ora2pg-Configuration
.Sample output from this logfile is as follows:
This script executes four separate Ora2pg jobs or tasks for the
CUSTOMER
,PART
, andSUPPLIER
tables and a single job forREGION
andNATION
tables. These jobs run innohup
mode (that is, they execute in the background).The
CUSTOMER
andPART
tables each have a CLOB data column. While migrating the data into the target RDS for PostgreSQL database, we simulate loading more transactions to theSUPPLIER
andCUSTOMER
tables and running separate jobs for these tables. Because thatPART
table has a CLOB column, we also run a separate job for this table.NATION
andREGION
tables don’t have any new inserts coming in, so we group them into a single Ora2pg job.For these separate jobs, we generate separate log files. The log file names are
ora2pg_aws_rds_oracle_to_pg_customer_blog.out
,ora2pg_aws_rds_oracle_to_pg_part_blog.out
,ora2pg_aws_rds_oracle_to_pg_region_nation_blog.out
, andora2pg_aws_rds_oracle_to_pg_supplier_blog.out
. These are stored in the home directory.In our testing, executing this script took around 1 hour and 20 mins. While executing the script, you can check these log files to see if the copy is still running. When the script is completed, you see a message like this:
5080.91user 55.69system 1:18:57elapsed
. TheCUSTOMER
andPART
tables’ jobs take a long time to finish because they have CLOB column data to copy from the source RDS for Oracle database to the target PostgreSQL database.
- Run this command at the Unix shell prompt:
4. Load new data into the source Oracle database
When the copy from the source RDS for Oracle database to the target RDS for PostgreSQL database has started, simulate new data load into the source RDS for Oracle database:
- Log in to the first Amazon EC2 instance, with the tag name
Ora2pg-blog-EC2InstanceFor-SourceDataLoading
, for loading the test data. You can get the EC2 instance’s public DNS value by referring to the output table’s key from the CloudFormation template, that isEC2InstanceForSourceDataLoadingPublicDNS
. - Run the following command to load new data into the
CUSTOMER
table. This command loads new 100 rows into theCUSTOMER
table in the source Oracle database. Make sure that you run this command in theEC2InstanceForSourceDataLoadingPublicDNS
instance.Here, AWSORA is the name of the database that was created in the source RDS for Oracle database.
- Run the following command to load new data into the
SUPPLIER
table. This command loads AN additional 1,000,000 rows into supplier table. Make sure that you run this command in the first EC2 instance.
5. Create the dms-vpc-role IAM role if this doesn’t exist and add an IAM policy
AWS accounts that have never used AWS DMS before get a failure when you create the AWS::DMS::ReplicationSubnetGroup
resource using the preceding CloudFormation template. This failure occurs because the dms-vpc-role
doesn’t exist. Also, creation of the CloudFormation stack in the next step fails if the dms-vpc-role
doesn’t exist or isn’t configured properly.
You can check whether this IAM role exists and create it if it doesn’t exist by using the following AWS CLI commands:
- Check if the
dms-vpc-role
IAM role exists using the following CLI command. - Create
dms-vpc-role
if it doesn’t exist using the following CLI command.
Next, to accomplish the following steps, attach the AWS managed policy AmazonDMSVPCManagementRole
to the dms-vpc-role
IAM role using the following CLI command.
6. Create the AWS DMS source and target endpoints and DMS instance
Using this CloudFormation template, set up the source DMS source endpoint, target endpoint, and DMS replication instance. To launch directly the template through the console, click on the below Launch Stack button:
This template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.
For this parameter | Use this | Predefined or manual entry? |
Stack name | Provide a unique stack name, for example ora2pg-dms-stack-1 |
Manual entry |
DMSInstanceType | dms.c4.4xlarge | Manual entry |
SourceRDSOracleEndpointID | Provide the RDS for Oracle source endpoint | Manual entry |
SourceRDSOracleDatabaseName | AWSORA | Predefined |
SourceRDSOracleUserName | awsorauser |
Predefined |
SourceRDSOracleDBPassword | awsorauser |
Predefined |
SourceRDSOracleDBPort | 1521 | Predefined |
TargetRDSPostgreSQLEndpointID | Provide the RDS for PostgreSQL endpoint | Manual Entry |
TargetRDSPostgreSQLDatabaseName | awspgdb |
Predefined |
TargetRDSPostgreSQLUserName | root |
Predefined |
TargetRDSPostgreSQLDBPassword | Rootroot123 |
Predefined |
TargetPostgreSQLDBPort | 5432 |
Predefined |
VPCID | Select the VPC that was created in the CloudFormation template mentioned preceding | Manual entry |
VPCSecurityGroupId | Provide the ID of the security group that was created using the first CloudFormation template. You can get the value from the Output section of the first CloudFormation template. | Manual entry |
DMSSubnet1 | Provide first subnet ID corresponding to your VPC. This subnet was created as part of the first CloudFormation template. Get the value from the Output section of the first CloudFormation template. | Manual entry |
DMSSubnet2 | Provide second subnet ID corresponding to your VPC. This subnet was created as part of the first CloudFormation template. Get the value from the Output section of the first CloudFormation template. | Manual entry |
DMSCDCTaskStartTime | Provide the CDC start time as 15 minutes before the time stamp value from the scn_information_from_source.out log file generated from the Ora2pg tool full copy. This time must be in this format: YYYY-MM-DD HH:MI:SS |
Manual entry |
After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, select the I acknowledge that AWS CloudFormation might create IAM resources with custom names check box, and then choose Create.
When the stack launch is complete, it returns output with information about the resources that were created. Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
The output from the preceding template is as follows.
Key | Value |
DMSRepulicationInstance | Name |
DMSPostgreSQLEndpoint | us-east1 |
DMSOracleEndpoint | Output of the Oracle endpoint |
7. Check that the scripts are completed
Before proceeding with replication, first make sure that the setting_ora2pg_host_step2.sh
script is completed.
You can check this by checking the log files ora2pg_aws_rds_oracle_to_pg_part_blog.out
and ora2pg_aws_rds_oracle_to_pg_customer_blog.out
. These files are in the EC2 instance with the identifier EC2InstanceForOra2PgPublicDNS
that was created using the first CloudFormation template. If the script is completed, at the end of the log files you should see a message like 5080.91user 55.69system 1:18:57 elapsed
.
You can also verify this by checking the number of rows the script has copied into target RDS for PostgreSQL database. To check the number of rows, run the following commands to connect to the PostgreSQL database from EC2InstanceForOra2PgPublicDNS
instance:
- Connect to the PostgreSQL database with the following command.
In this command, change the value for
<RDS_Postgresql_EndPoint>
to the correct RDS for PostgreSQL endpoint. You’re prompted for the password for the root user, and it isRootroot123
. - When you’re logged in, change the schema path using the following command.
- Run the following select commands to check the number of rows in the PostgreSQL database.
The
CUSTOMER
andPART
tables should show 5000, because we migrated 5000 rows from theCUSTOMER
andPART
tables in the source RDS for Oracle database to the target RDS for PostgreSQL database. - Use the
\q
command to exit from the psql shell.
8. Create DMS tasks with CDC to enable replication
Now that we have completed the initial data migration from RDS for Oracle to RDS for PostgreSQL, we are ready to replicate the remaining data and can enable ongoing replication. To do so, we use AWS DMS with the CDC option. Because we divided our migration process into four tasks, we create four AWS DMS tasks to replicate the changes made on the source RDS for Oracle database to the target RDS for PostgreSQL database.
Before creating tasks, make sure that you test the connections from the DMS replication instance to the source and target RDS databases using the endpoints. To do so, first go to the Endpoints section in the DMS console, select your source endpoint, and choose Test Connection. Next, select the correct replication instance from the list, and choose Run test. Then, go to the Endpoints section again, select the target endpoint, and choose Test Connection. Follow the same steps to make sure both connections are successful.
Then, using this CloudFormation template, set up the source DMS tasks. To launch directly through the console, click on the below Launch Stack button:
This template takes the following parameters. Some of the parameters are predefined, and you can’t edit these. For some of the parameters, you need to provide the values. The table gives details.
For this parameter | Use this | Predefined or manual entry? |
DMSCDCTaskStartTimeInMilliSeconds | Provide the timestamp value from the output of the scn_information_from_source.out log file. |
Manual entry |
DMSRepulicationInstanceARN | Provide the Amazon Resource Name (ARN) for the replication instance. You can get this value from the output of the previous template. | Manual entry |
DMSSourceOracleEndpointARN | Provide the endpoint ARN for the DMS source Oracle instance. You can get this value from the output of the previous template. | Manual entry |
DMSTargetSPostgreSQLEndpointARN | Provide the endpoint ARN for the PostgreSQL endpoint. You can get this value from the output of the previous template. | Manual entry |
After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.
The preceding CloudFormation template creates the tasks and moves to the Ready state. When the stack launch is complete, manually start the DMS tasks.
To do this, sign in to the DMS console and navigate to the Tasks page. Select one task at a time and choose Start/Resume for each. Doing so initiates change data capture starting at the specified time from the source RDS for Oracle database to the target RDS for PostgreSQL database.
In a real-world scenario, once the DMS tasks have replicated all the data changes to the RDS for PostgreSQL database, you stop all connections to the source Oracle database. You then wait to catch up all the remaining transactions, if any, to the target database. When they have, you can stop the DMS task in the DMS console.
To finish the process, recreate triggers, indexes, and foreign key constraints on the target database. Drop existing sequences in the PostgreSQL database, and migrate the sequences using Ora2pg separately with CACHE
value 1. Make sure that data validations are performed, and cut over your applications to point to PostgreSQL and restart them.
Cleaning up after the migration
After completing and testing this solution, clean up the resources by stopping your tasks and deleting the CloudFormation stacks:
- Go to the DMS console, navigate to the Tasks page, and stop the tasks manually one after another.
- Go to the CloudFormation web UI and delete the stacks in reverse order of their creation.
- Delete the Amazon EBS volume that was created as part of this blog post:
- Go to EC2 web console, and choose Volumes in the Elastic Block Store
- Select the EBS volume with the name
oracle-client-for-rds-oracle-restored-volume1
- For Actions, choose Delete Volume.
- Also delete any RDS snapshots that you created for these databases.
Conclusion
In this post, we give you an overview of migrating an Amazon RDS for Oracle database to Amazon RDS for PostgreSQL using the open-source tool Ora2pg and AWS DMS. You can also use the same procedure to migrate your on-premises Oracle database to an Amazon RDS for PostgreSQL database or Amazon Aurora PostgreSQL database. To accomplish this, you can migrate the initial data load using various methods and then migrate the remaining data using the custom CDC start-time option available in AWS DMS, as we have shown. You enable the ongoing replication until the cutover is complete.
About the Author
Srikanth Kodali is a big data consultant at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance about database projects, helping them improve the value of their solutions when using AWS.