Configure change data capture parameters on Amazon RDS for SQL Server

AWS Database Migration Service (AWS DMS) is a managed migration and replication service that helps you move your database and analytic workloads to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. AWS DMS can migrate data to and from most widely used commercial and open source databases.

SQL Server is a relational database developed by Microsoft. Amazon Relational Database Service (Amazon RDS) for SQL Server makes it straightforward to set up, operate, and scale SQL Server deployments in the cloud. Amazon RDS supports data replication via change data capture (CDC), and having CDC enabled is one of the prerequisites for using Amazon RDS for SQL Server with AWS DMS. CDC captures changes that are made to the data in the tables. It stores metadata about each change, which you can access later.

In this post, we do a deep dive on CDC parameters and explain their impact when configuring AWS DMS, in addition to discussing a few best practices.

Prerequisites

To follow along with this post, you should have familiarity with the following AWS services:

AWS DMS
Amazon RDS for SQL Server

Additionally, you need an AWS account with sufficient privileges to launch the resources necessary for this solution.

How AWS DMS works with Amazon RDS for SQL Server

For Amazon RDS for SQL Server, AWS DMS reads the transaction log (T-log) using Microsoft functions and gets the top 50,000 events by default. AWS DMS starts by querying the database log on the specific partition IDs that relate to the tables defined in the AWS DMS task. Partition IDs are read during each table reload, task restart, and task resume during both full load and CDC. AWS DMS retrieves the object IDs and obtains the data partition IDs corresponding to those object IDs. After you get the partition IDs, you fetch the relevant partitions from the T-log. This cycle runs in intervals of every second.

The following diagram illustrates the architecture. In this example, we use Amazon RDS for SQL Server as a source. The target for the AWS DMS task can be any supported endpoint.

AWS DMS reads the T-log using Microsoft functions, and requires CDC to be enabled on the source database and the tables that will be in scope of the AWS DMS task.

Why is CDC required for AWS DMS?

When CDC is enabled on a table, SQL Server creates a table in the cdc schema for this table. The changed table is populated with the change data and is assigned a name based on the schema and table being tracked. For instance, if you have a table called customer under the dbo schema, a table named cdc.customer_CT would be created on the cdc schema to record all changes against the dbo.customer table.

AWS DMS doesn’t read from the change tables. AWS DMS requires CDC to be enabled to make sure enhanced logging is captured in the T-log for AWS DMS to read the changes. As explained in the previous section, AWS DMS uses Microsoft functions to read the T-log. Consider the following table on the source:

CREATE TABLE [dbo].[dmstest](
 [id] [int] NOT NULL,
 [name] [varchar](50) NULL,
CONSTRAINT [PK_dmstest] PRIMARY KEY CLUSTERED
(
 [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

If you issue an UPDATE statement on this table and update the [name] column, you can see the difference between the [RowLog Contents 0] and [RowLog Contents 1] captured with and without CDC. For the sake of brevity, we have included a snippet of the following query that AWS DMS runs on the source:

select top 50000 [Current LSN], [operation], [RowLog Contents 0], [RowLog Contents 1] -- After Image

The output of the query shows the complete information captured in the T-log in the second record (the UPDATE statement was issued after enabling CDC).

Current LSN	operation	RowLog Contents 0	RowLog Contents 1
0000014f:0000c16d:0002	LOP_MODIFY_ROW	0x1800746573747573657267	0x190074657374757365726162
0000014f:0000c9ba:0016	LOP_MODIFY_ROW	0x30000800020000000200000100190074657374757365726162	0x300008000200000002000001001800746573747573657267

Understanding CDC parameters

For CDC, two jobs are created:

Capture job – Scans through the T-log file to read the changes and pushes those changes to the change tracking tables
Cleanup job – Cleans up the records in the change tracking tables which exhausted the retention period

The following are the CDC parameters that pertain to AWS DMS:

max_trans – Maximum number of transactions to process in each scan cycle
max_scans – Maximum number of scan cycles to run in order to extract all rows from the log
continuous – Indicates whether the capture job is to run continuously (1) or only once (0)
polling_interval – Number of seconds between log scan cycles
retention – Number of minutes that change rows are to be retained in change tables

Although AWS DMS doesn’t read the change tables, you must tune the CDC parameters to control the retention of changes in the T-log.

In the next section, we explain how the parameters max_trans, max_scans, and polling_interval help in retaining the log records in the T-log and how to tune them so that changes are retained for sufficient duration for AWS DMS to capture changes.

CDC parameters in action

We walk through the following steps to illustrate these parameters:

Create a database called dmscdc and a table called dmstestcdc under the database:

create database dmscdc;

use dmscdc;

CREATE TABLE dbo.dmstestcdc(n INT NOT NULL PRIMARY KEY);

Enable CDC on the database dmscdc and the table dmstestcdc:
```
exec msdb.dbo.rds_cdc_enable_db 'dmscdc';

use dmscdc;
exec sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'dmstestcdc' ,
@role_name = 'CDCRole',
@supports_net_changes = 1;
```
You have to tune the CDC parameters to make sure that the log records are retained for sufficient period so that AWS DMS will be able to query the transaction record—that is, the specific log sequence number (LSN) it is looking for in the transaction log file of the source database. These are subjected to the following factors:
- How many transactions is the target behind compared to the source?
- What is the polling interval, specifically the frequency with which the CDC job runs?
- What is the value of the Maxtrans and Maxscans? These parameters determine how many transactions CDC will process on each run.

Configure the capture job as follows. You must stop and start the CDC job every time you change the capture job parameters, which in our case is pollinginterval:

EXECUTE sys.sp_cdc_change_job
@job_type = N'capture',
@pollinginterval = 3599;--Setting the polling interval for 1 hour

EXEC sys.sp_cdc_stop_job @job_type = N'capture';
EXEC sys.sp_cdc_start_job @job_type = N'capture';

Confirm the CDC parameters by running the following command:

exec sys.sp_cdc_help_jobs;

job_id	job_type	job_name	maxtrans	maxscans	continuous	pollinginterval	retention	threshold
A49487C5-BF3C-4A8C-9385-6AFA7A3541B9	capture	`cdc.dmscdc_capture`	500	10	1	3599	0	0
17511020-59D2-4C9E-BEA9-0578C0D23B11	cleanup	`cdc.dmscdc_cleanup`	0	0	0	0	4320	5000

With the preceding setting, the capture job will process 5,000 records (maxtrans * maxscans) with a frequency of 1 hour.

Insert a few records into the table dmstestcdc to confirm this:
```
DECLARE @max AS INT, @min AS INT;
SET @max = 100000;
SET @min = 1;

WHILE @min <= @max
BEGIN
INSERT INTO dbo.dmstestcdc VALUES(@min);
SET @min=@min+1;
END
```
The capture job reads the preceding transactions from the T-log and marks those as replicated, which is 100,001 records in our case. When the CDC job runs, the capture job will mark those transactions as done.
Check the CDC sessions by running the following query, which should fetch 10 rows. The query will tell us how many records CDC processed, which is 5,000 in our case.
```
SELECT tran_count,start_time,end_time, scan_phase from sys.dm_cdc_log_scan_sessions where scan_phase<>'Aggregate' order by end_time desc
```

tran_count	start_time	end_time	scan_phase
500	2023-12-07 20:34:15.100	2023-12-07 20:34:15.123	Done
500	2023-12-07 20:34:15.067	2023-12-07 20:34:15.083	Done
500	2023-12-07 20:34:15.037	2023-12-07 20:34:15.053	Done
500	2023-12-07 20:34:15.003	2023-12-07 20:34:15.023	Done
500	2023-12-07 20:34:14.963	2023-12-07 20:34:14.990	Done
500	2023-12-07 20:34:14.927	2023-12-07 20:34:14.950	Done
500	2023-12-07 20:34:14.883	2023-12-07 20:34:14.910	Done
500	2023-12-07 20:34:14.840	2023-12-07 20:34:14.870	Done
500	2023-12-07 20:34:14.797	2023-12-07 20:34:14.827	Done
500	2023-12-07 20:34:14.540	2023-12-07 20:34:14.773	Done

The preceding records will be purged from the T-log when the backup of the T-log happens on Amazon RDS for SQL Server, typically every 5 minutes, which helps in maintaining the size of the T-log and moving the LSN forward. The remaining records (95,001) will be picked up by the subsequent run of the capture job.

SQL Server doesn’t flush the T-log until after the transactions have been read by CDC. You need to strike a balance between how many records you’re retaining in the T-log and the AWS DMS replication lag. In this case, we make the capture job parameters aggressive by defining a shorter polling interval; then there can be a scenario that the LSN might be missing from the T-log. To avoid T-log truncation and make sure changes are retained in the T-log for sufficient duration, we recommend setting the polling interval to 1 day by running the following command:

use dbname

EXEC sys.sp_cdc_change_job @job_type = 'capture' ,@pollinginterval = 86399

exec sp_cdc_stop_job 'capture'

exec sp_cdc_start_job 'capture'

Capture historical information of CDC

To monitor the historical information of the capture job, you can query the sys.dm_cdc_log_scan_sessions table. The table contains one row for each log scan session in the current database. It contains up to 32 scan sessions. Run the following query to get the latest 10 records:

SELECT session_id, start_time, end_time, duration, scan_phase,
error_count, tran_count,command_count,last_commit_cdc_time, latency, empty_scan_count, failed_sessions_count
FROM sys.dm_cdc_log_scan_sessions order by end_time desc OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY;

The following is a sample output.

session_id	start_time	end_time	duration	scan_phase	error_count	tran_count	command_count	last_commit_cdc_time	latency	empty_scan_count	failed_sessions_count
0	2023-12-07 19:21:27.283	2023-12-08 00:34:12.837	6	Aggregate	0	125001	125001	2023-12-07 19:50:32.657	17020	0	0
651	2023-12-08 00:34:12.820	2023-12-08 00:34:12.837	0	Done	0	500	500	2023-12-07 19:50:32.657	17020	0	0
650	2023-12-08 00:34:12.790	2023-12-08 00:34:12.810	0	Done	0	500	500	2023-12-07 19:50:31.700	17021	0	0
649	2023-12-08 00:34:12.760	2023-12-08 00:34:12.780	0	Done	0	500	500	2023-12-07 19:50:30.707	17022	0	0
648	2023-12-08 00:34:12.703	2023-12-08 00:34:12.723	0	Done	0	500	500	2023-12-07 19:50:29.757	17023	0	0
647	2023-12-08 00:34:12.670	2023-12-08 00:34:12.693	0	Done	0	500	500	2023-12-07 19:50:28.620	17024	0	0
646	2023-12-08 00:34:12.633	2023-12-08 00:34:12.660	0	Done	0	500	500	2023-12-07 19:50:27.523	17025	0	0
645	2023-12-08 00:34:12.587	2023-12-08 00:34:12.620	0	Done	0	500	500	2023-12-07 19:50:26.527	17026	0	0
644	2023-12-08 00:34:12.530	2023-12-08 00:34:12.573	0	Done	0	500	500	2023-12-07 19:50:25.490	17027	0	0
643	2023-12-08 00:34:12.500	2023-12-08 00:34:12.520	0	Done	0	500	500	2023-12-07 19:50:24.450	17028	0	0

Best practices and known issues

In this section, we discuss some best practices and considerations pertaining to the CDC parameters.

T-Log records truncated on failover in Multi-AZ instances

Always make sure that when the CDC parameters are changed on the primary instance, they are retained on the failover by running the rds_set_configuration command.

For instance, you can run the following sample command on the database dms_test to set the maxtrans and pollinginterval parameters:

USE dms_test;

EXEC sys.sp_cdc_change_job
@job_type = 'capture',
@maxtrans = 10000,
@pollinginterval = 6000;

Run the following commands to make sure these values are retained post-failover:

EXEC rdsadmin..rds_set_configuration 'cdc_capture_maxtrans' , 10000;
EXEC rdsadmin..rds_set_configuration 'cdc_capture_pollinginterval' , 6000;

Planned failovers or maintenance of the AWS DMS replication instance

For Amazon RDS for SQL Server, you need to make sure that every time the AWS DMS task is stopped for any maintenance activity on the source or during a planned scaling of the related AWS DMS replication instance, the capture job doesn’t run. When the capture job runs, the scanned events will be purged from the T-log when the T-log backup happens on Amazon Simple Storage Service (Amazon S3) every 5 minutes.

Stop the capture job by running the following command:
```
exec sp_cdc_stop_job 'capture'
```
Stop the AWS DMS task.
Finish your desired maintenance.
Resume the AWS DMS task.
Wait for the source latency to be 0.
Start the capture job by running the following command:
```
exec sp_cdc_start_job 'capture'
```

The AWS DMS task will fail with the following error message if the aformentioned sequence of steps are not followed:

2023-10-06T15:02:05 [SOURCE_CAPTURE ]E: Failed to access LSN '0000019f:00007fff:0008' in the backup log sets since BACKUP/LOG-s are not available. [1020465] (sqlserver_endpoint_capture.c:813)

If you observe that the LSN is getting truncated on the source after stopping the capture job, there might not be any CDC events in the active T-log that could have prevented the truncation. This can arise when the database is idle or has fewer transactions. In this scenario, the sequence of steps is as follows:

Stop the capture job by running the following command:
```
exec sp_cdc_stop_job 'capture'
```
Make sure there are some transactions or changes in the CDC-enabled database before stopping the AWS DMS task. You can run a script that runs DML statements every second. If you want to create a test script, you can follow the instructions given later in this section.
Stop the AWS DMS task.
Finish your desired maintenance.
Resume the AWS DMS task.
Wait for it to sync by monitoring the source latency.
Stop the script that you set up in Step 2.
Start the capture job by running the following command:

exec sp_cdc_start_job 'capture'

Follow these instructions to set up a script to run the test script mentioned in Step 2. In the following script, you create a table called test_table under the dbo” schema and then enable CDC on the test_table table. You then set up a SQL Server agent job that will insert a record and delete the record into the aforementioned table. This makes sure there are changes in the T-log that need to be picked up by the CDC job and therefore will prevent T-log truncation.

Create the test table:

create table dbo.test_table (id int not null PRIMARY KEY);

Add the new table to CDC:

use dmscdc;
exec sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'test_table' ,
@role_name = 'CDCRole',
@supports_net_changes = 1;

Create a SQL Server agent job in Amazon RDS to insert or delete a record every 1 minute. Use the appropriate owner_login_name and database_name values in your agent job:

USE [msdb]

GO

/****** Object: Job [aws_dms_traffic_to_test_table] Script Date: 10/9/2023 4:17:28 PM ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 10/9/2023 4:17:28 PM ******/

IF NOT EXISTS (SELECT
name FROM msdb.dbo.syscategories WHERE
name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN

EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)

EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'aws_dms_traffic_to_test_table', 

 @enabled=1, 

 @notify_level_eventlog=0, 

 @notify_level_email=0, 

 @notify_level_netsend=0, 

 @notify_level_page=0, 

 @delete_level=0, 

 @description=N'No description available.', 

 @category_name=N'[Uncategorized (Local)]', 

 @owner_login_name=N'admin', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object: Step [generate_traffic] Script Date: 10/9/2023 4:17:28 PM ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'generate_traffic', 

 @step_id=1, 

 @cmdexec_success_code=0, 

 @on_success_action=1, 

 @on_success_step_id=0, 

 @on_fail_action=2, 

 @on_fail_step_id=0, 

 @retry_attempts=0, 

 @retry_interval=0, 

 @os_run_priority=0, @subsystem=N'TSQL', 

 @command=N'insert into dbo.test_table values (30);

delete from dbo.test_table where id = 30;

', 

 @database_name=N'dmscdc', 

 @flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'schedule for running DML statements for generating user_traffic', 

 @enabled=1, 

 @freq_type=4, 

 @freq_interval=1, 

 @freq_subday_type=4, 

 @freq_subday_interval=1, 

 @freq_relative_interval=0, 

 @freq_recurrence_factor=0, 

 @active_start_date=20231006, 

 @active_end_date=99991231, 

 @active_start_time=0, 

 @active_end_time=235959, 

 @schedule_uid=N'84a2b2ab-4234-40a3-add4-c04d561ad88f'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

 IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

GO

On the AWS DMS console, exclude this table from your AWS DMS task using mapping rules if you’re using any wildcards (%) in the table selection rules of your AWS DMS task that would replicate this table:
```
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "dbo",
"table-name": "test_table"
},
"rule-action": "exclude",
"filters": []
},
```

Planned restart or failover of the RDS for SQL Server instance

The RDS for SQL Server agent service restarts whenever there is a reboot of the RDS for SQL Server instance or a failover and this causes the CDC job to rerun after the reboot or failover. To avoid the truncation of the T-log, follow these steps:

Stop the AWS DMS task.
Capture the current maxtrans and maxscans values, which you will revert after the failover:
```
sys.sp_cdc_help_jobs;
```

Change the CDC configuration to set maxtrans and maxscans to 1:

EXEC sys.sp_cdc_change_job @job_type = 'capture' ,@maxtrans = 1, @maxscans = 1
exec sp_cdc_stop_job 'capture'
GO

Run the following statement so that the CDC parameters are retained after the failover:

EXEC rdsadmin..rds_set_configuration 'cdc_capture_maxtrans' , 1;
EXEC rdsadmin..rds_set_configuration 'cdc_capture_maxscans' , 1;

Restart the RDS for SQL Server instance.
Resume the AWS DMS task.
Restart the captured job with the restored configuration. In the following script, we have assumed maxtrans of 500 and maxscans of 10, but you should use the values that were captured in Step 2:
```
EXEC sys.sp_cdc_change_job @job_type = 'capture', @maxtrans = 500, @maxscans = 10
exec sp_cdc_stop_job 'capture'
exec sp_cdc_start_job 'capture'
GO
```

Run the following statement so that the CDC parameters are retained after the failover:

EXEC rdsadmin..rds_set_configuration 'cdc_capture_maxtrans' , 500;
EXEC rdsadmin..rds_set_configuration 'cdc_capture_maxscans' , 10;

Clean up

To avoid incurring recurring charges, clean up your resources:

On the AWS DMS console, delete any AWS DMS task that you set up.
Drop the database by running the following command:
```
EXECUTE msdb.dbo.rds_drop_database N'dmscdc'
```

Conclusion

In this post, we shared the importance of configuring CDC parameters when using Amazon RDS for SQL Server as a source for configuring AWS DMS tasks, and also discussed some best practices. If you have any feedback or questions, leave them in the comments.

About the Authors

Suchindranath Hegde is a Data Migration Specialist Solutions Architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on data migration to the AWS Cloud using AWS DMS.

Abhishek Chaturvedi is a Sr. Database Engineer on the Amazon Web Services DMS team.

Mahesh Kansara is a Database Engineering manager at Amazon Web Services. He closely works with development and engineering teams to improve the migration and replication service. He also works with our customers to provide guidance and technical assistance on various database and analytical projects, helping them improving the value of their solutions when using AWS.

Junu Thankappan is a Senior Database Engineer with Amazon Web Services. He works with the AWS RDS team, focusing on commercial database engines and SQL Server.

AWS Database Blog