Grouping database tables in AWS DMS tasks for Oracle source engine

AWS Database Migration Service is a cloud service designed to simplify the process of migrating and replicating databases, data warehouses and other data stores. It offers a comprehensive solution for both homogeneous and heterogeneous database migrations, facilitating transitions between different database platforms.

The migration process typically involves two major phases:

Migration of existing data (full load) – During this phase, AWS DMS performs a full load migration, where all existing data from the source data store is transferred to the target data store.
Ongoing replication (change data capture) – After the initial full load migration, AWS DMS enables ongoing replication of data changes through a process called change data capture (CDC). The changes are continuously replicated and applied to the target data store in real time or near real time, making sure the target data store remains synchronized with the source.

AWS DMS supports a wide range of source and target data stores, including relational databases, data warehouses, and NoSQL databases. Every migration project is unique and poses distinct challenges, making it impossible to have a one-size-fits-all solution. For a successful migration, it’s crucial to address the database and application prerequisites, with initial preparation and design playing a pivotal role in achieving this objective.

Although AWS DMS provides powerful tools for database migration, every migration project is unique and poses distinct challenges. Proper preparation and design are crucial for a successful migration process, particularly when it comes to performance tuning and addressing potential latency issues.

In this post, we provide guidelines on identifying potential root causes of full load and CDC latency early on, and offer recommendations for optimally grouping tables to achieve the best performance for the AWS DMS task. By following the approach outlined in this post, you can more effectively plan your database migration, estimate task sizes, and design an efficient migration process that minimizes potential issues and optimizes performance.

Know your data

Working on a migration project requires you to first understand what data you’re migrating. The full load migration phase involves transferring a complete copy of the source data to the target database. This process makes sure the target database is populated with an identical version of the data from the source database as at the point the migration started.

CDC is a technique that captures incremental changes to data and data structures (schemas) from the source database in real-time or near real-time. CDC enables the propagation of these changes to other databases or applications, making sure they remain synchronized with the source database.

You can assess the full load and CDC requirements for database migration by analyzing the database size, workload, and hardware information to provide recommendations on the number of tasks and table groupings required for each task.

Various factors can directly and indirectly influence the speed of data migration while using AWS DMS. The following are some of the common factors for full load as well as CDC:

Database object size – Keeping huge tables (larger than 2 TB) in a dedicated migration task can help improve migration efficiency. By isolating the handling of extensive data sets within a specific task or operation, the migration process can potentially become more streamlined and effective.

Partitioned and non-partitioned objects – You can migrate large partitioned tables by loading multiple tables in parallel. AWS DMS also allows you to load a single large table using multiple parallel threads. This is particularly useful for tables with billions of records having multiple partition and sub-partitions.
Objects without a primary key and unique index – AWS DMS requires a primary key or unique key for source tables with large objects (LOBs) to migrate.
LOBs – LOB columns need special treatment due to the fact that AWS DMS can’t determine LOB size per row per column to be able to allocate appropriate memory on the replication instance side. AWS DMS provides full, limited and inline LOB modes to migrate LOB data. Keeping LOBs in a separate task can help efficiently manage migration activity.
Volume of changes – If your workload involves a high volume of CDC changes, updating the same set of records over and over again or inserting or updating and deleting the same records, you can improve the target apply throughput using batch apply.

Solution overview

The goal of this post is to analyze the source database’s data dictionary and combine that information with hardware details to create recommendations for efficient data migration tasks. This analysis helps determine the optimal number of AWS DMS tasks and grouping of tables within those tasks, reducing potential latency issues during the migration process.

The workflow involves the following steps:

1. Create a control table on the source database.
2. Populate the control table by analyzing table size, partitions, indexes, constraints, data types, and LOB data using data dictionary tables and views.
3. Capture the daily growth of tables by monitoring the volume of incoming changes.
4. Use code to group tables.

The following diagram illustrates the solution architecture.

Prerequisites

To follow along with this post, you should have familiarity with the following:

AWS DMS
Oracle Relational Database Service
SQL and PL/SQL procedures

Create a control table

In this step, we create a control table named table_mapping which will provide a one-window view to understand what data we’re migrating. This table is created by referring to data dictionary information on the size of table, partition, partition size (count, average, minimum, maximum), LOB column, number of indexes, primary key or unique key constraints, foreign key constraints, and DDL/DML operation count of truncate, insert, update, and delete operations on the table.

This control table provides baseline data to use in the next step to group table.

Connect to your source Oracle database.

Run the following SQL block to create the control table:

CREATE TABLE TABLE_MAPPING (
OWNER                                              VARCHAR2(30),
OBJECT_NAME                                        VARCHAR2(30),
OBJECT_TYPE                                        VARCHAR2(30),
SIZE_IN_MB                                         NUMBER (12,4),
STEP                                               NUMBER (2),
IGNORE                                             CHAR (3),
PARTITIONED                                        CHAR (3),
PART_NUM                                           NUMBER,
SPECIAL_HANDLING                                   CHAR (3),
PK_PRESENT                                         CHAR (3),
UK_PRESENT                                         CHAR (3),
LOB_COLUMN                                         NUMBER,
GROUPNUM                                           NUMBER,
TOTAL_DML					     NUMBER
);

Populate the control table

In this step, you populate the control table with data dictionary objects in Oracle databases that provide information about the size, type, partitioning, constraints, and LOB data related to database objects. Specifically, the following data dictionary objects are mentioned:

DBA_SEGMENTS – Provides information about the size and type of database objects, such as tables, indexes, and clusters
DBA_CONSTRAINTS – Contains details about the different types of constraints defined on database objects, such as primary keys, foreign keys, unique constraints, and check constraints
DBA_LOBS – Provides information about LOB data types, such as Binary Large Objects (BLOBs) and Character Large Objects (CLOBs), which are used to store large amounts of unstructured data

These data dictionary objects can be queried to retrieve metadata about the database objects, allowing administrators and developers to identify and analyze the structure, size, constraints, and modifications made to various objects in the database.

We read from the data dictionary objects into the control table.

Insert into the control table the database table details such as name, size, partition, and so on for the desired schema (here we are using ‘ADMIN‘):

INSERT INTO TABLE_MAPPING( OWNER,OBJECT_NAME,SIZE_IN_MB,PART_NUM,PARTITIONED) SELECT OWNER,SEGMENT_NAME,SUM(BYTES/1024/200),COUNT(1), 'NO' FROM DBA_SEGMENTS WHERE OWNER='ADMIN' AND SEGMENT_TYPE LIKE 'TABLE'  GROUP BY  OWNER,SEGMENT_NAME
  UNION ALL
SELECT OWNER,SEGMENT_NAME,SUM(BYTES/1024/200),COUNT(1),'YES' FROM DBA_SEGMENTS WHERE OWNER='ADMIN' AND SEGMENT_TYPE ='TABLE PARTITION' GROUP BY  OWNER,SEGMENT_NAME
 
COMMIT;

Update the PK_PRESENT field of the control table for DB tables that have primary keys defined:

UPDATE TABLE_MAPPING SET PK_PRESENT='YES' WHERE OBJECT_NAME IN (SELECT TABLE_NAME FROM DBA_CONSTRAINTS WHERE OWNER='ADMIN' AND CONSTRAINT_TYPE='P' );
UPDATE TABLE_MAPPING SET PK_PRESENT='NO' WHERE OBJECT_NAME NOT IN (SELECT TABLE_NAME FROM DBA_CONSTRAINTS WHERE OWNER='ADMIN' AND CONSTRAINT_TYPE='P' );
COMMIT;

Update the UK_PRESENT field of the control table for database tables that have unique keys defined:

UPDATE TABLE_MAPPING SET UK_PRESENT='NO' WHERE OBJECT_NAME NOT IN (SELECT TABLE_NAME FROM DBA_CONSTRAINTS WHERE OWNER='ADMIN' AND CONSTRAINT_TYPE='U' );
UPDATE TABLE_MAPPING SET UK_PRESENT='YES' WHERE OBJECT_NAME IN (SELECT TABLE_NAME  FROM DBA_CONSTRAINTS WHERE OWNER='ADMIN' AND CONSTRAINT_TYPE='U' );
COMMIT;

Update the LOB_COLUMN field of the control table for database tables that have at least one LOB column:

UPDATE TABLE_MAPPING SET LOB_COLUMN='1' WHERE OBJECT_NAME IN (SELECT TABLE_NAME FROM DBA_LOBS WHERE OWNER='ADMIN');
UPDATE TABLE_MAPPING SET LOB_COLUMN='0' WHERE OBJECT_NAME NOT IN (SELECT TABLE_NAME FROM DBA_LOBS WHERE OWNER='ADMIN');
COMMIT;

Capture the volume of DML changes over time

The DBA_TAB_MODIFICATIONS table in the Oracle database contains information about modifications made to all tables in the database since the last time statistics were gathered on those tables. This includes both partition and non-partition operations on the tables.

By reviewing the data from DBA_TAB_MODIFICATIONS on a daily basis, you can gain insights into the daily modifications made to tables. This information can be valuable for identifying tables that have undergone a significant number of changes, which may impact their performance or require maintenance tasks like statistics gathering or reorganization.

Create a staging table called MST_DBA_TAB_MOD to get the details from DBA_TAB_MODIFICATION:

Create Table MST_DBA_TAB_MOD
(
DATA_DATE      DATE,
TABLE_OWNER VARCHAR2(128),
TABLE_NAME   VARCHAR2(128),
PARTITION_NAME         VARCHAR2(128),
SUBPARTITION_NAME VARCHAR2(128),
INSERTS              NUMBER,
UPDATES            NUMBER,
DELETES              NUMBER,
TIMESTAMP      DATE,
TRUNCATED      VARCHAR2(3),
TOTAL_DML             NUMBER,
DROP_SEGMENTS          NUMBER);

Populate the MST_DBA_TAB_MOD table with the daily average DML count for the tables by gathering the information from Oracle metadata views dba_tab_modifications and dba_tables. The counts are noted from the last analyze date and then averaged, so having recent table stats information will make it more accurate.

Insert into MST_DBA_TAB_MOD(TABLE_OWNER,TABLE_NAME,PARTITION_NAME,INSERTS,UPDATES,DELETES ,TRUNCATED,TOTAL_DML) select a.TABLE_OWNER,a.TABLE_NAME,a.PARTITION_NAME,a.INSERTS,a.UPDATES,a.DELETES ,a.TRUNCATED,  round(a.INSERTS+a.UPDATES+a.DELETES /TO_NUMBER ( a.TIMESTAMP-b.LAST_ANALYZED)) from dba_tab_modifications a, dba_tables b where a.TABLE_OWNER=b.OWNER and a.TABLE_NAME=b.table_name and a.TABLE_OWNER='ADMIN';
COMMIT;

Next, we populate the DML information in the control table_mapping_mapping table by running the following update statement where we collect this information from MST_DBA_TAB_MOD which we have populated in the previous step:

update table_mapping a  set a.TOTAL_DML=(select  b.TOTAL_DML from   MST_DBA_TAB_MOD b  where a.OWNER=b.TABLE_OWNER and a.OBJECT_NAME=b.TABLE_NAME  and b.TABLE_OWNER ='ADMIN')
WHERE EXISTS (
    SELECT 1 from   MST_DBA_TAB_MOD b  where a.OWNER=b.TABLE_OWNER and a.OBJECT_NAME=b.TABLE_NAME  and b.TABLE_OWNER ='ADMIN'); 
COMMIT;

Categorize the tables by step number

Tables are then categorized in a database based on various factors related to their data characteristics, as shown in the following code. These factors include whether the table is partitioned or non-partitioned, whether it contains LOBs or not, the table size, and the number of DML operations performed on the table. Non-partitioned tables that don’t have LOB fields are categorized as step 1, non-partitioned tables with LOB fields are categorized as step 2, partitioned tables without LOB fields are categorized as step 3, and so on. For example, we have classified all tables reporting SIZE_IN_MB as 0 as out of scope for the migration. You can add additional tables to this step if your use case needs it.

The categorization of tables based on these attributes is done for optimization purposes.

update table_mapping set STEP=1 where LOB_COLUMN=0 and PARTITIONED='NO' AND STEP IS NULL;
update table_mapping set STEP=2 where LOB_COLUMN=1 and PARTITIONED='NO' AND STEP IS NULL;
update table_mapping set STEP=3 where LOB_COLUMN=0 and PARTITIONED='YES' AND STEP IS NULL;
update table_mapping set STEP=4 where LOB_COLUMN=1 and PARTITIONED='YES' AND STEP IS NULL;
update table_mapping set STEP=5 where TOTAL_DML >9999999;
update table_mapping set STEP=0 where SIZE_IN_MB=0 AND STEP IS NULL;
commit;

Group tables

The process of recommending groups begins by creating and populating the TABLE_MAPPING_GROUPS table based on the information stored in the TABLE_MAPPING control table. The process is initiated by a procedure that takes three parameters:

Mapping table (TABLE_MAPPING)
Source migration schema (ADMIN)
Size of the DB objects per task (600 GB)

The size of the objects per task (600 GB) is chosen to evenly distribute the tables across tasks, considering factors such as CPU, network, and I/O capabilities of the source and target replication instances.

We use a shell script to call the stored procedure cap_parfile_grouping_gen.sql to group the tables and then list the grouped tables using the step and groupnum using another SQL file, temp_sql_gen.sql:

export map_tab=$1
export usr=$2
export groupsize=$3

echo "Generating partitioned tables lst files..."
export usr=ADMIN
export tabmap_conn="admin/<Password>@<db_Instance>.rds.amazonaws.com:1521/orcl"

# Building groups
sqlplus -s ${tabmap_conn} @cap_parfile_grouping_gen.sql ${map_tab} ${usr} ${groupsize}

rm -rf parfile.lst
echo "
*********************************************************
Table are grouped based on  ${groupsize}
*********************************************************
1(10,11,...) --> Non Partition Table.
3(30,31....) --> Non Partition Table(LOB)
4(40,41...)  --> Partition Table(LOB).
5(50,51...)  --> High DML Table.
7(70,71...)  --> Partition Table.
0(0,99)     --> Ignore or Skip/Special Handling table.
**********************************************************
"

sqlplus -s ${tabmap_conn}<<!
set head off pages 0 lines 80 feed off
spool parfile
select unique step||groupnum from table_mapping_groups where owner = '${usr}' and step is not null;
spool off;
!

rm -rf ADMIN_TAB*
for env_code in `cat parfile.lst`
do
echo " Generating table list for step and group ${env_code} "

sqlplus -s ${tabmap_conn} @temp_sql_gen.sql ${env_code} > ${usr}_TAB_${env_code}.lst

done
exit

Next, the procedure contained in cap_parfile_grouping_gen.sql has two separate sections: one for partitioned tables and another for non-partitioned tables, because it groups based on the provided information. The TABLE_MAPPING_GROUPS table is created and populated with the grouped tables from the TABLE_MAPPING table. This grouping process helps evenly distribute the data across tasks, facilitating efficient replication or data movement operations.

set serveroutput on
set verify off
 
spool parfile_grouping_gen.log
 
DECLARE
  vTab VARCHAR2(30);
 
BEGIN
  SELECT tname
  INTO   vTab
  FROM   tab
  WHERE  tname = 'TABLE_MAPPING_GROUPS';
 
  execute immediate('DROP TABLE table_mapping_groups');
 
EXCEPTION
  WHEN NO_DATA_FOUND THEN NULL;
END;
/
 
  CREATE TABLE table_mapping_groups AS
    SELECT owner, object_name, object_type, size_in_mb, step, ignore, partitioned, 0 groupnum
    FROM   &1
    WHERE  owner = upper('&2')
    — only tables
    AND    object_type like 'TABLE%'
    AND    object_name not like 'BIN\$%'
    AND   (ignore = 'NO' or ignore IS NULL);
 
  — Grouping tables
  DECLARE
    vDumpSize NUMBER := 0;
    vSumBytes NUMBER := 0;
    vGroupNum NUMBER := 0;
    vPrevStep NUMBER := 1;
 
  BEGIN
    vDumpSize := 1024*&3;
 
    FOR regGroupTabs IN (SELECT * FROM table_mapping_groups WHERE partitioned != 'YES' ORDER BY step, size_in_mb) LOOP
 
      IF (regGroupTabs.step != vPrevStep) THEN
        vGroupNum := 0;
        vSumBytes := 0;
      END IF;
 
      vSumBytes := vSumBytes + regGroupTabs.size_in_mb;
 
      IF (vSumBytes >= vDumpSize) THEN
        vGroupNum := vGroupNum + 1;
        vSumBytes := 0;
      END IF;
 
      — dbms_output.put_line ('Procesing table '||regGroupTabs.object_name||' on group '||vGroupNum||' with size MB '||regGroupTabs.size_in_mb);
 
      UPDATE table_mapping_groups
      SET    groupnum    = vGroupNum
      WHERE  owner       = regGroupTabs.owner
      AND    object_name = regGroupTabs.object_name;
 
      vPrevStep := regGroupTabs.step;
 
    END LOOP;
    COMMIT;
  END;
  /
 
  DECLARE
    vDumpSize NUMBER := 0;
    vSumBytes NUMBER := 0;
    vGroupNum NUMBER := 0;
    vPrevStep NUMBER := 1;
 
  BEGIN
    vDumpSize := 1024*&3;
 
    FOR regGroupTabs IN (SELECT * FROM table_mapping_groups WHERE partitioned = 'YES' ORDER BY step, size_in_mb) LOOP
 
      IF (regGroupTabs.step != vPrevStep) THEN
        vGroupNum := 0;
        vSumBytes := 0;
      END IF;
 
      vSumBytes := vSumBytes + regGroupTabs.size_in_mb;
 
      IF (vSumBytes >= vDumpSize) THEN
        vGroupNum := vGroupNum + 1;
        vSumBytes := 0;
      END IF;
 
      — dbms_output.put_line ('Procesing table '||regGroupTabs.object_name||' on group '||vGroupNum||' with size MB '||regGroupTabs.size_in_mb);
 
      UPDATE table_mapping_groups
      SET    groupnum    = vGroupNum
      WHERE  owner       = regGroupTabs.owner
      AND    object_name = regGroupTabs.object_name;
 
      vPrevStep := regGroupTabs.step;
 
    END LOOP;
    COMMIT;
  END;
  /
 
  INSERT INTO table_mapping_groups
    SELECT owner, object_name, object_type, size_in_mb, 0 step, ignore, partitioned, 99 groupnum
    FROM   &1
    WHERE  owner = upper('&2')
    — only tables
    AND    object_type like 'TABLE%'
    AND    object_name not like 'BIN\$%'
    AND    ignore is not null;
 
  — hardcoded condition
  UPDATE table_mapping_groups
  SET    groupnum = 0
  WHERE  step = 2;
 
  COMMIT;
 
col partitioned for a20
set lines 150
col size_in_mb format 999,999,990
col group_size  format 999,999,990
col num_files  format 9,999
col GROUPNUM for a10
select partitioned, step, groupnum as sub_step,step||groupnum groupnum, count(*) table_count, sum(size_in_mb) group_size_in_mb, 1024*&3 Desired_group_size
from   table_mapping_groups
where  partitioned = 'NO'
group  by step, partitioned, groupnum
union
select partitioned, step, groupnum as sub_step,step||groupnum group_num, count(*) table_count, sum(size_in_mb) group_size_in_mb, 1024*&3 Desired_group_size
from   table_mapping_groups
where  partitioned = 'YES'
group  by step, partitioned, groupnum
order  by 4;
spool offexit;

The final temp_sql_gen.sql comprises the following select statement to iteratively go over table_mapping_groups and list the table names by the respective group names, which can be used to create the respective AWS DMS tasks:

set head off feed off
set linesize 50
set pagesize 0
set verify off

select object_name from (select object_name,SIZE_IN_MB
from   table_mapping_groups
where
object_type    like  'TABLE%'
and    step||groupnum = &1
order  by SIZE_IN_MB desc) ;
exit;

Demonstration

For demonstration purposes, we use an Oracle source instance with size 4.4 TB to migrate data and a grouping task in sets of 600 GB. The instance type used is db.m5.4xlarge, with 8 cores, 16 vCPU, and 64 GiB of memory. After evaluating CPU utilization, network bandwidth, and workload, it was determined that grouping tasks in sets of 600 GB would be ideal for this workload. However, it’s recommended to carefully review the source database size, workload, CPU, memory, and network utilization to identify the appropriate group size, because each source is different.

The algorithm involves sorting step information from a loop table in descending order, and then grouping the sorted data based on a specified size input parameter.

We have clubbed the following steps under a shell script so you don’t have to manually run the individual steps:

Create a control table
Populate the control table
Capture the volume of DML changes
Categorize the tables by step number

The following code is the associated shell script:

export tabmap_conn="admin/XXXXXX@rds-oracle-db.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:1521/orcl"

# Building groups
sqlplus -s ${tabmap_conn} @ table_mapping.sql

echo " Table are grouped based on Partition, lob object and dml count 
*************************************************
1 --> Non-Partition Table.
2 --> Non-Partition Table (LOB)
3 --> Partition Table.
4 --> Partition Table (LOB)
5 --> High DML Table.
0 --> Ignore or Skip/Special Handling table.
*************************************************
In next steps we will be groping in uniform size.
*************************************************

The following code is the associated table_mapping SQL file:

spool TABLE_MAPPING.log
DROP table TABLE_MAPPING;
CREATE TABLE TABLE_MAPPING (
OWNER                                              VARCHAR2(30),
OBJECT_NAME                                        VARCHAR2(30),
OBJECT_TYPE                                        VARCHAR2(30),
SIZE_IN_MB                                         NUMBER (12,4),
STEP                                               NUMBER (2),
IGNORE                                             CHAR (3),
PARTITIONED                                        CHAR (3),
PART_NUM                                           NUMBER,
SPECIAL_HANDLING                                   CHAR (3),
PK_PRESENT                                         CHAR (3),
UK_PRESENT                                         CHAR (3),
LOB_COLUMN                                         NUMBER,
GROUPNUM                                           NUMBER);


insert into TABLE_MAPPING( OWNER,OBJECT_NAME,OBJECT_TYPE,SIZE_IN_MB,PART_NUM,PARTITIONED)
select OWNER,SEGMENT_NAME,SEGMENT_TYPE,sum(BYTES/1024/200),count(1), 'NO' from dba_segments where OWNER='ADMIN' AND SEGMENT_TYPE LIKE 'TABLE'  group by  OWNER,SEGMENT_NAME,SEGMENT_TYPE
union all
select OWNER,SEGMENT_NAME,SEGMENT_TYPE,sum(BYTES/1024/200),count(1),'YES' from dba_segments where OWNER='ADMIN' and SEGMENT_TYPE ='TABLE PARTITION' group by  OWNER,SEGMENT_NAME,SEGMENT_TYPE;
commit;

update TABLE_MAPPING set PK_PRESENT='YES' where OBJECT_NAME in (Select TABLE_NAME  from dba_constraints where owner='ADMIN' and CONSTRAINT_TYPE='P' );
update TABLE_MAPPING set PK_PRESENT='NO' where OBJECT_NAME not in (Select TABLE_NAME  from dba_constraints where owner='ADMIN' and CONSTRAINT_TYPE='P' );
update TABLE_MAPPING set UK_PRESENT='NO' where OBJECT_NAME not in (Select TABLE_NAME  from dba_constraints where owner='ADMIN' and CONSTRAINT_TYPE='U' );
update TABLE_MAPPING set UK_PRESENT='YES' where OBJECT_NAME in (Select TABLE_NAME  from dba_constraints where owner='ADMIN' and CONSTRAINT_TYPE='U' );
commit 


update TABLE_MAPPING set LOB_COLUMN='1' where OBJECT_NAME in (Select TABLE_NAME from dba_lobs where OWNER='ADMIN');
update TABLE_MAPPING set LOB_COLUMN='0' where OBJECT_NAME not in (Select TABLE_NAME from dba_lobs where OWNER='ADMIN');

commit;
update TABLE_MAPPING a set a.TOTAL_DML=(select b.TOTAL_DML from tab_total_dml b where  a.OBJECT_NAME=b.TABLE_NAME );

col OBJECT_NAME for a33;
set lines 300
col OWNER for a8
col OBJECT_TYPE for a20

select * from TABLE_MAPPING;

update TABLE_MAPPING set STEP=1 where LOB_COLUMN=0  and PARTITIONED='NO' AND STEP IS NULL;
update TABLE_MAPPING set STEP=2 where LOB_COLUMN=1 and PARTITIONED='NO'  AND STEP IS NULL;
update TABLE_MAPPING set STEP=3 where LOB_COLUMN=0  and PARTITIONED='YES'  AND STEP IS NULL;
update TABLE_MAPPING set STEP=4 where  LOB_COLUMN=1 and PARTITIONED='YES'  AND STEP IS NULL ;
update TABLE_MAPPING set STEP=5 where TOTAL_DML >9999999 ;
update TABLE_MAPPING set STEP=0 where  SIZE_IN_MB=0  AND STEP IS NULL;
commit;

col OBJECT_NAME for a33;
set lines 300
col OWNER for a8
col OBJECT_TYPE for a20

select * from TABLE_MAPPING order by STEP; 

select distinct STEP ,sum(SIZE_IN_MB)/1024 "Size_GB" , sum(TOTAL_DML)  from TABLE_MAPPING  group by STEP order by STEP; 
spool off;
exit

The following code is a snippet of the control table table_mapping, populated from the data dictionary after running the shell script mentioned in the previous step:

OWNER                          OBJECT_NAME                    OBJECT_TYPE                    SIZE_IN_MB       STEP IGN PAR   PART_NUM SPE PK_ UK_ LOB_COLUMN   GROUPNUM  TOTAL_DML
------------------------------ ------------------------------ ------------------------------ ---------- ---------- --- --- ---------- --- --- --- ---------- ---------- ----------
ADMIN                          BONUS_1                        TABLE                              163.84          1     NO           1     NO  NO           0                  1540
ADMIN                          RANGE_SALES_TIME               TABLE                           138936.32          1     NO           1     YES NO           0                662053
ADMIN                          DEPT_CAP                       TABLE                              163.84          1     NO           1     YES NO           0                   216
ADMIN                          SAL_TM                         TABLE                           251658.24          1     NO           1     YES NO           0                840900
ADMIN                          PRINT_MEDIA_CAP_A              TABLE                              2457.6          3     NO           1     YES NO           1                   462
ADMIN                          MANGO                          TABLE                             78643.2          5     NO           1     YES NO           0              32760303
ADMIN                          ORANGE                         TABLE                          1512570.88          1     NO           1     YES NO           0               1171399
ADMIN                          USAIL                          TABLE                             13107.2          1     NO           1     YES NO           0                892200
ADMIN                          SALGRADE_3                     TABLE                              163.84          1     NO           1     NO  NO           0                  1080
ADMIN                          SALGRADE_CAP                   TABLE                              163.84          1     NO           1     NO  NO           0                  1080
ADMIN                          PRINT_MEDIA_HASH_CAP_A         TABLE                              2457.6          3     NO           1     YES NO           1                   693
ADMIN                          APPLE                          TABLE                             78643.2          1     NO           1     YES NO           0                639800
ADMIN                          PINA                           TABLE                             1802.24          1     NO           1     YES NO           0                372200
ADMIN                          TX                             TABLE                           456130.56          1     NO           1     YES NO           0               1190600
ADMIN                          EMP_1                          TABLE                              163.84          1     NO           1     YES NO           0                  3024
ADMIN                          SALGRADE_1                     TABLE                              163.84          1     NO           1     NO  NO           0                  1080
ADMIN                          BONUS_CAP                      TABLE                              163.84          1     NO           1     NO  NO           0                  1540
ADMIN                          SAL                            TABLE                             13107.2          1     NO           1     YES NO           0                784208
ADMIN                          AUSTIN                         TABLE                            83886.08          1     NO           1     YES NO           0                590600
ADMIN                          IAD                            TABLE                            60293.12          1     NO           1     YES NO           0                449000
ADMIN                          SALGRADE_2                     TABLE                              163.84          1     NO           1     NO  NO           0                  2160
ADMIN                          DEPT_3                         TABLE                              163.84          1     NO           1     YES NO           0                   216

The following is a screenshot of the code snippet.

After the table grouping script is run, the final output consists of six initial step numbers (0 – 5):

0 for ignore or skip for special handling tables
1 for non-partitioned tables
2 for non-partitioned tables with LOBs
3 for partitioned tables
4 for partitioned tables with LOBs
5 for high DML tables

Next, we run the table grouping shell script to create the final grouping of the tables. Tables that are part of a group will be included under a single AWS DMS task. We already went over the code details in prior sections.

The following code illustrates the shell script details:

export map_tab=$1
export usr=$2
export groupsize=$3

echo "Generating partitioned tables lst files..."
export usr=ADMIN
export tabmap_conn="admin/XXXXXXX@rds-oracle-db.ckyx0wdxr13x.us-east-1.rds.amazonaws.com:1521/orcl"

# Building groups
sqlplus -s ${tabmap_conn} @cap_parfile_grouping_gen.sql ${map_tab} ${usr} ${groupsize}

rm -rf parfile.lst
echo "
*********************************************************
Table are grouped based on  ${groupsize}
*********************************************************
1(10,11,...) --> Non Partition Table.
2(20,21....) --> Non Partition Table(LOB)
3(30,31...)  --> Partition Table.
4(40,41...)  --> Partition Table(LOB).
5(50,51...)  --> High DML Table.
0(0,99)     --> Igonre or Skip/Special Hndling table.
**********************************************************
"


sqlplus -s ${tabmap_conn}<<!
set head off pages 0 lines 80 feed off
spool parfile
select unique step||groupnum from table_mapping_groups where  owner = '${usr}' and step is not null;
spool off;
!

rm -rf ADMIN_TAB*
for env_code in `cat parfile.lst`
do
echo " Generating table list for step and group ${env_code} "

sqlplus -s  ${tabmap_conn} @temp_sql_gen.sql  ${env_code} > ${usr}_TAB_${env_code}.lst

done
exit

The PL/SQL procedure is used to generate a final recommendation for migrating data tables. It involves running a PL/SQL block with the following steps:

Check and drop an existing table named table_mapping_groups to create a fresh one
Create a new table_mapping_groups table from TABLE_MAPPING_GROUPS
Group partitioned tables with LOB and non-LOB columns based on a group size of 600, resulting in groups like 10, 11, 12, 13
Group non-partitioned tables with LOB and non-LOB columns using a similar procedure

The process accommodates tables requiring special handling by separating them into distinct groups or excluding them from the migration based on specific requirements. The migration process described here is unique and may vary based on individual case requirements. In the following example, tables are grouped based on ${groupsize} of 600 GB:

PARTITIONED                STEP   SUB_STEP GROUPNUM   TABLE_COUNT GROUP_SIZE_IN_MB DESIRED_GROUP_SIZE
-------------------- ---------- ---------- ---------- ----------- ---------------- ------------------
YES                           0          0 00                   2                0             614400
NO                            1          0 10                  35        547389.44             614400
NO                            1          1 11                   2         707788.8             614400
NO                            1          2 12                   1        521666.56             614400
NO                            1          3 13                   1       1512570.88             614400
NO                            2          0 20                   4          7536.64             614400
YES                           3          0 30                   4        398458.88             614400
YES                           3          1 31                   2        817889.28             614400
YES                           4          0 40                   4        608174.08             614400
YES                           4          1 41                   1        335544.32             614400
NO                            5          0 50                   1          78643.2             614400
11 rows selected.

*********************************************************
Table are grouped based on 600
*********************************************************
1(10,11,...) --> Non Partition Table.
2(20,21....) --> Non Partition Table(LOB)
3(30,31...)  --> Partition Table.
4(40,41...)  --> Partition Table(LOB).
5(50,51...)  --> High DML Table.
0(0,99)     --> Igonre or Skip/Special Hndling table.
**********************************************************
20
50
13
31
30
12
41
00
10
40
11
 Generating table list for step and group 20
 Generating table list for step and group 50
 Generating table list for step and group 13
 Generating table list for step and group 31
 Generating table list for step and group 30
 Generating table list for step and group 12
 Generating table list for step and group 41
 Generating table list for step and group 00
 Generating table list for step and group 10
 Generating table list for step and group 40
 Generating table list for step and group 11

Group numbers are generated by concatenating two columns, step and sub_step. For example, step 1 and sub_step 0 combined to become group number 10 with 34 tables and a size of 547 GB.

Similarly, the following groups are also created:

Group number 10 will have 34 table with size of 547 GB.
Group number 11 will have 2 table with size of 707 GB.
Group number 12 will have 1 table with size of 521 GB.
Group number 13 will have 1 table with size of 1,512 GB.
Group number 20 will have 4 table with size of 7 GB with lob object.
Group number 30 will have 4 table with size of 335 GB with partition.
Group number 31 will have 2 table with size of 817 GB with partition.
Group number 40 will have 4 table with size of 608 GB with lob object and partition.
Group number 41 will have 1 table with size of 335 GB with lob object and partition.
Group number 50 will have 1 table with size of 78 GB with high DML change.

In the preceding output, we have grouped tables as close to the desired group size of 600 GB as possible. For example, group 10 contain 34 tables with a total size of 547 GB. A single huge table with a size of 1.5 TB has been kept in a separate task as group 13 because we don’t want to split a single table. Depending on the number of tables and table sizes, some groups will be either less or greater than the desired group size of 600 GB. Therefore, a database of 4.4 TB is migrated with 10 tasks by grouping them based on partition, non-partition LOB, and DML volume.

After the script runs, the following files are generated with the table names:

ADMIN_TAB_00.lst
ADMIN_TAB_10.lst
ADMIN_TAB_11.lst
ADMIN_TAB_12.lst
ADMIN_TAB_13.lst
ADMIN_TAB_20.lst
ADMIN_TAB_30.lst
ADMIN_TAB_31.lst
ADMIN_TAB_40.lst
ADMIN_TAB_41.lst
ADMIN_TAB_50.lst

Table names that part of the same file as shown in the following code should be grouped together in a single task:

[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_00.lst
TBL_CLOB
PRINT_MEDIA

[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_00.lst
TBL_CLOB
PRINT_MEDIA
[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_12.lst
RAN_SAL_TM
[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_13.lst
ORANGE
[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_11.lst
TX
SAL_TM

[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_30.lst
CAP_TIME_RANGE_SALES
XXY_TIME_RANGE_SALES
SALES_TEST
TIME_RANGE_SALES


[ec2-user@ip-10-0-2-149]$ cat ADMIN_TAB_10.lst
RANGE_SALES_TIME
CANADA
AUSTIN
APPLE
IAD
RAN
SAL
USAIL
SALLAS
RAN_SAL
PINA
PWD
EMP_2
EMP_3
EMP
SALGRADE
BONUS_2
EMP_CAP
TABLE_MAPPING
DUMMY
DEPT_2
BONUS_3
DEPT_1
MY_TABLE
DEPT
DEPT_3
SALGRADE_2
BONUS_CAP
SALGRADE_1
EMP_1
SALGRADE_CAP
SALGRADE_3
DEPT_CAP
BONUS_1
BONUS

Clean up

As part of this demonstration, we created a few tables on the source Oracle database, you can clean up these tables, using the following drop statements after the task grouping has been noted:

DROP TABLE TABLE_MAPPING;
DROP TABLE MST_DBA_TAB_MOD;
DROP TABLE TABLE_MAPPING_GROUPS;

Conclusion

In this post, we discussed how you can use existing data dictionary information to analyze database size, workload, and hardware specifications. This analysis helps determine the optimal number of tasks and table groupings for efficient database migration using AWS DMS. By combining database object details with source database hardware information, you can make educated decisions during the migration design phase. Additionally, we recommend referring to Best practices for AWS Database Migration Service to fine-tune your migration tasks.

We invite you to share your feedback, comments, and questions in the comments section.

About the Authors

Manojit Saha Sardar is a Senior Database Engineer with Amazon Web Services and Subject Matter Expert for AWS DMS, Amazon RDS and Amazon RDS for PostgreSQL. He has over a decade of experience in working with relational databases. At AWS, he works with the customers to handle a variety of migration scenarios and assist them in troubleshooting issues related to Amazon RDS for Oracle as well as Amazon RDS for PostgreSQL.

Chirantan Pandya is a Database Engineer with Amazon Web Services and Subject Matter Expert for AWS DMS. He is based out of Dallas, Texas, and specializes in databases migration and has over 15 years of experience. At AWS, he collaborates closely with our customers to provide guidance and technical assistance on database migration projects.

Select your cookie preferences

AWS Database Blog