AWS Database Blog
Best practices for configuring parameters for Amazon RDS for MySQL, part 2: Parameters related to replication
This blog post was last reviewed or updated May, 2022.
In the previous blog post of this series, I discuss MySQL parameters used to tune and optimize Amazon Relational Database Service (Amazon RDS) for MySQL performance and best practices related to them. In today’s post, I discuss the most important MySQL parameters used for replication configuration and optimizing replication in an RDS MySQL environment.
Note: The default values mentioned in this post apply to Amazon RDS for MySQL 5.7. You can find the default values for Amazon RDS for MySQL 8.0 in the corresponding MySQL documentation or in the RDS for MySQL 8.0 default parameter group which you can access using the AWS CLI or the Amazon RDS console.
Parameters related to both single and multi-threaded replication
Following, I list parameters that you can use for both single and multi-threaded replication, with best practice suggestions for configuring each.
sync_binlog
The sync_binlog
option controls how MySQL flushes the binary log to disk.
The default value of sync_binlog
is 1. On your replication master, when sync_binlog
is set to 0 it does no synchronizing to disk—in this case, the server relies on the operating system to flush the binary log’s contents from time to time, as for any other file. Thus, this setting improves the replication performance by avoiding having MySQL flush the binary log (binlog) to disk. This approach gives the best performance.
However, if MySQL crashes, the binary log will likely be missing several transactions. It’s generally necessary to rebuild the replicas to ensure that they are in sync with the master. For RDS read replication, which has disabled backup and which has no “chained” read replica, sync_binlog
doesn’t apply because there’s no need to sync the binlog. We recommend that you have your backup retention period set to 0 on the replicas to avoid binary log generation.
However, if you want to minimize data loss the sync_binlog
parameter should be set to 1 on the replica source. The best values to set depend on whether you prioritize performance or durability.
binlog_row_image
You can specify two event formats supported by binary logs using the parameter binlog_format
. These are STATEMENT and ROW. Using the row-based format enables you to log nondeterministic queries and means that temporary tables aren’t created on replicas. On the other hand, the statement-based format is more compact than the row-based one.
You can use the parameter binlog_row_image
to control how much information is logged in the binary log for a row-based event. The state of a row is represented by an “image” in a binary log. For any row-based event, there are two kinds of images, the before image and the after image. The row before a change is made is represented by the before image. The row after a change is made is represented by the after image. Not all the events have before and after image.
The following table summarizes various row-based events and their available images. INSERT statements create the Write_rows
events.
Event Type | Before Image | After Image |
Write_rows | Included | |
Update_rows | Included | Included |
Delete_rows | Included |
The following example shows these images in detail using the mysqlbinlog utility to read these images in human-readable format.
Following is the output of the mysqlbinlog --base64-output=decode-rows --verbose
command for the events generated by the preceding INSERT
, UPDATE
, and DELETE
statements. The section after where
represents the before image. The section after SET
represents the after image.
This INSERT
example has an after image.
This UPDATE
example has both a before image and an after image.
This DELETE
example has only a before image.
For the preceding example, all the column details are recorded with default configuration. The binlog_row_image
parameter helps control which columns and image are logged for the preceding events.
The supported values for the parameter are as follows:
- full – log all columns in the before and after image.
- minimal – log only changed columns in the after image, and the columns needed to identify rows in the before image.
- noblob – log all columns, except BLOB and TEXT columns, unless they change.
The following sections discuss these format values. When deciding which image format to use, consider the advantages and disadvantages of these formats and their application to your use case and workload.
Full
For the full
value, an example from the update_row
event from preceding example follows, which has both a before image and an after image.
Even though the second column val1
is not updated, it is included in the before image and the after image.
A couple advantages of using full
images are these:
- All the storage engines support this format.
- You can use this format to roll back data manipulation language (DML) statements. This format logs all the columns of the before image of a row for an event that deletes rows. Thus, you can reconstruct the rows can be reconstructed by reinserting the value logged in the before image.
However, the disadvantage of using binlog_row_image=full
is that it takes more space in binary logs to log the events. This can be a problem particularly if there are many update statements with columns having BLOB or TEXT values.
Minimal
The minimal
option logs only the information needed to apply the changes:
- Before image – only the primary key value is logged.
- After image – columns where the values have changed are logged.
For the preceding insert, update, and delete example, if you use the minimal
option the three events are as follows.
The INSERT
example has only an after image:
The UPDATE
example has both a before image and an after image:
The DELETE
example has only a before image:
As seen from the preceding example, there is no difference between the full
and minimal
options for the Write_rows
events due to the nature of the INSERT
statement. However, the size of the images logged for the UPDATE
and DELETE
statements corresponding to update rows and delete rows events are much smaller with the minimal option than with the full option.
The advantages of using the minimal image format are as follows:
- The binary log events are smaller for
UPDATE
andDELETE
The savings can be considerable if most of the updates are single column and there are many large columns with varchar, char, text, blob, and similar data types. - The small binary logs not only save space, but also cause less disk I/O and network traffic for replication.
The disadvantages of using the minimal
image format are as follows:
- This approach works only for tables with explicit primary key or a not-NULL unique index.
- Because not all columns are logged in the after image but rather only changed columns are logged, the requirements for column definition on the replica instance is stricter. Following are the requirements:
- Both the tables must have the same primary keys
- The data types must be same
- The order of the columns must be same
- The master and replica must have same columns. Otherwise, the replica can go out of sync with no notice due to an update event.
noblob
The noblob
image is same as the full
image except that a BLOB or text column is included in the after image only if the column is required or changed.
slave_pending_jobs_size_max
This parameter defines the maximum amount of memory allocated to replica worker queues that hold events that haven’t yet been applied. The default value is 16 MB. This value should not be less than the master’s value for max_allowed_packet
(see following). If it is, it can cause the replica worker queue to be full, even though there are events coming from master that have to be processed.
max_allowed_packet
Ensure that the same value for max_allowed_packet
is configured on both master and read replica. Otherwise, the difference might cause replication lag due to the constraint of the low max_allowed_packet
value on the replica. I’ll talk about this topic in more detail in the next part of the blog series, about parameters related to connectivity and timeout.
innodb_rollback_on_timeout
If this parameter is specified, a transaction timeout causes InnoDB to abort and roll back the entire transaction, not only the last query on which execution succeeded. Doing this helps avoid various replication errors related to duplicate entry. The default value is OFF, which disables this feature.
innodb_flush_log_at_trx_commit
As I discuss in part 1 of this blog series, about parameters related to performance, setting innodb_flush_log_at_trx_commit
to 0 or 2 on the replica instance reduces the frequency at which the log buffer is flushed to disk. This in turn reduces the performance hit from disk writes. However, as mentioned, doing this comes with the tradeoff of potentially losing some transactions in case of a crash.
query_cache settings
We recommend that you disable the query cache in read replicas. We recommend this because when a read replica writes to the database, most of the time it needs to invalidate the query cache.
To disable the query cache, you can set query_cache_type
to 0 and query_cache_size
to 0. For more details about working with the query cache, see part 1 of this blog series.
read_only
You can use this parameter to allow updates from clients to the read replica. The default value is TrueIfReplica
. For a replica instance, TrueIfReplica
sets the value to ON (1) and disables any write activity from clients. For a master/writer instance, TrueIfReplica
sets the value to OFF (0) and thus enables write activity from clients on the master/writer instance.
You enable this latter functionality by setting read_only
to 1 in the parameter group. Turning off this parameter by setting read_only
to 0 makes the read replica writable. However, as a best practice for RDS MySQL read replicas, we don’t recommend changing read replicas to writable this way for a long period. This setting can cause replication errors and issues with data consistency.
log_bin_trust_function_creators
To enable functions, procedures, or triggers when binary logging is enabled requires the MySQL SUPER privilege. This privilege is restricted for RDS MySQL DB instances. However, you can enable functions, procedures, and triggers in RDS MySQL instance when binary logging is enabled. You do so by setting the log_bin_trust_function_creators
parameter value to 1. Otherwise, you might get an error like the following:
The default value is 0.
Parameters related to multi-threaded replication only
Following, I list parameters related to multi-threaded replication, with best practice suggestions for configuring each.
slave_parallel_type
This parameter is available from MySQL version 5.7. Its value defines the policy that decides which transactions run in parallel when multi-threaded replication is enabled. To do so, it uses a nonzero value for the slave_parallel_workers
parameter.
This parameter can have two values, LOGICAL_CLOCK and DATABASE. The default value is DATABASE. When LOGICAL_CLOCK is set, transactions that are part of the same binary-log group commit on the master run in parallel on the replica. To provide additional parallelization, dependencies between transactions are tracked using time stamps.
If the value is set to DATABASE, transactions running on different databases are applied in parallel, but no parallelization within a schema is possible with this value. This value can apply when data in different databases is partitioned and updated concurrently and independently on the master. To use DATABASE, there must be no cross-database constraints.
To decide which value to use for this parameter, consider the preceding constraints and whether you want parallelization within schemas or between schemas. For parallelization within a schema, LOGICAL_CLOCK is the only option.
To use slave_preserve_commit_order=1
(discussed following), you must use LOGICAL_CLOCK for slave_parallel_type
.
There is a known issue when slave_parallel_type
is LOGICAL_CLOCK and slave_preserve_commit_order
is enabled that multi-threaded slave (MTS) workers can hang for a number of seconds equal to the innodb_lock_wait
timeout. This issue is due to MySQL Bug 82400 and MySQL Bug 25082593 . The workaround is to set the replica to use a transaction isolation level of READ COMMITTED. You can do this by using the parameter tx_isolation
.
slave_parallel_workers
This parameter sets the number of worker threads on the replica instance running transactions in parallel. The default value is 0, and that value disables parallel execution of worker threads. Allowed values are 0–1,024.
Using parallel transaction execution on replicas provides better scalability in replication. However, this approach works only if master and replicas both are on at least MySQL 5.6 or higher. When slave_parallel_workers
is set to greater than zero, you can’t retry transactions and slave_transaction_retries
is treated as equal to 0. Setting this value for this parameter wasn’t always honored correctly in MySQL 5.6.3 (bug 13334470), which has been fixed in 5.6.4. Also, bug 84415 can affect things when this parameter is enabled.
In MySQL 5.6, parallel replication uses only one thread per schema so can be effectively used where there are multiple databases. In 5.7, this approach can also be used for workloads within schemas.
Increasing the slave_parallel_workers
value doesn’t improve replication performance linearly. How to estimate the best value of this parameter depends on the workload and needs testing by simulating an actual workload and monitoring replication lag.
You can enable some of the performance schema instrumentation related to transaction to record executed transactions. Then, to know how many transactions are executed by each replication thread, check the performance schema tables performance_schema.events_transactions_summary_by_thread_by_event_name
and performance_schema.replication_applier_status_by_worker
. Doing so helps you see if all threads are used properly and then adjust the slave_parallel_workers value as needed. For more information, see this useful blog post from Percona.
How to tune this value depends on your workload. If your read replica is not being used for any read operations but only as a standby replica, you can set this value to the number of vCPUs for the instance. If your read replica is being used to serve read workloads and not as a standby, test your workload after you have set the value.
slave_preserve_commit_order
This parameter is available in MySQL 5.7. It ensures that transactions are externalized on the replica in the same order as they appear in the replica’s relay log. The default value is 0, which disables this feature.
We recommend this parameter to avoid a replication gap for multi-threaded replication setup when slave_parallel_type
is set to LOGICAL_CLOCK in MySQL 5.7. This parameter makes sure that replicas never enter a state that the master wasn’t in.
This option requires that as prerequisites you enable automatic backup (and thus binary logging) and the log_slave_updates
parameter. The log_slave_updates
parameter is enabled by default. Enabling binary logging on the replication instance can affect storage utilization and performance. Therefore, we recommend that you test replication with and without binary logging (automatic backup).
If the multi-threaded slave option is enabled, transactions can run in parallel. When the slave_preserve_commit_order
parameter is set, the executing replica waits to commit the transaction until all previous transactions are committed. When a replica thread is waiting for other worker threads for previous transactions to commit, its status is reported as waiting for preceding transaction to commit.
If this option is not enabled, there is a chance of gaps in the sequence of transactions run from the replay log. The default value is 0, which disables this parameter. However, in this case Exec_master_log_pos
might appear to be behind the position of the current master binary log file, which the SQL thread has read and executed to.
Conclusion
The preceding parameters are the most important parameters that can affect replication performance and stability in RDS MySQL. Following the best practices that I discuss for these parameters helps ensure that your MySQL RDS replicas are running with the minimum possible lags and that you avoid other operational issues.
In the next part of this blog series, I discuss commonly used MySQL parameters for implementing various security features, some of the parameters that help in managing an RDS DB instance’s operation and troubleshooting issues, and some useful parameters related to collation and character sets.
About the author
Saikat Banerjee is a cloud support engineer with Amazon Web Services.