Microsoft Workloads on AWS
Backup SQL Server databases to Amazon S3
In this blog post, I will explore the new functionality in SQL Server 2022 that allows you to run native backups directly to Amazon Simple Storage Service (Amazon S3). I will also review the best practices for security and considerations of cost and performance.
Update: We have recently published a new blog on how to Automate rotating IAM credentials for SQL Server backups to Amazon S3 to further secure this architecture if credential rotation is a requirement.
Solution overview
The Backup to URL feature was released with SQL Server 2014 (and SQL Server 2012 SP1 CU2), but until the SQL Server 2022 release, it only worked with Microsoft Azure Blob Storage. With SQL Server 2022, the Backup to URL feature now supports S3-compatible object storage. S3 compatibility means the storage solution uses the Amazon S3 API to provide an S3-compliant interface.
Implementing this feature for SQL Server deployed on an Amazon Elastic Compute Cloud (Amazon EC2) instance requires addressing security requirements and understanding BACKUP command parameters and their relationship to Amazon S3 capabilities. In the following sections, I will also look at other use cases around disaster recovery, offsite backups and associated costs, and finally, the performance capabilities when using the Backup to URL feature.
Note, the Backup to URL feature to an S3-compatible object storage is only supported in SQL Server 2022, prior versions are not supported.
Walkthrough
This walkthrough will take you through all the steps required to implement and test the Backup to URL feature of SQL Server 2022 deployed on an Amazon EC2 instance. You will learn how to set up a SQL Server instance for testing the feature, provision required Amazon S3 buckets, create an IAM policy providing access to these buckets, and create an IAM user with respective programmatic access to AWS. Finally, I will explain how you can implement a SQL Server backup to an Amazon S3 bucket and subsequent restores, including the parameters related to the Backup to URL feature.
Launch SQL Server instance
Launch an Amazon EC2 instance, selecting an Amazon Machine Image (AMI) preconfigured with Windows Server 2019 and SQL Server 2022 Enterprise Edition in the same region as your first Amazon S3 bucket. For this example, I have used us-east-1. You can refer to Launch an instance using the new launch instance wizard for step-by-step guidance.
Set up two Amazon S3 buckets
This walkthrough will require two Amazon S3 buckets, a sql-backups-2022 bucket in us-east-1 region and another bucket, sql-backups-2022-ohio, in us-east-2 region with the same name as the original one, but with the suffix “-ohio”, as shown in Figure 1. For detailed instructions on creating Amazon S3 buckets, please refer to Creating a bucket document.
Figure 1. Amazon S3 buckets
Configure an IAM policy
To create an AWS Identity and Access Management (IAM) policy that will define the specific minimum permissions needed to interact with the Amazon S3 buckets, please use the JSON file presented in Figure 2. This JSON file provides permissions s3:ListBucket to allow the listing of the bucket’s contents and s3:PutObject and s3:GetObject to allow the backup files to be written to and read from the bucket. The Resource clause defines the buckets that these permissions will apply to. If you selected different names for your buckets, please update the JSON file accordingly.
To create the IAM policy sql-backups-2022-policy, follow the instructions in the Creating IAM policies document with the JSON file shown in Figure 2.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::sql-backups-2022",
"arn:aws:s3:::sql-backups-2022-ohio"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::sql-backups-2022/*",
"arn:aws:s3:::sql-backups-2022-ohio/*"
]
}
]
}
Figure 2. JSON bucket policy
Create an IAM user
Now you need to create an IAM user and attach the IAM policy you created in the previous step. Use the instructions documented in Creating IAM users to create a user named sql-backups-user with programmatic access. Attach the policy sql-backups-2022-policy created in the previous step to this user.
When creating the user, take note of the Access Key ID and the Secret Access key, which will be used to create SQL Server credentials in the next step.
Create the SQL Server credentials
To provide SQL Server with the permissions needed to read/write files directly to Amazon S3, you will use the Access Key ID and Secret Access key from the previous step to create a SQL Server credential. More information on SQL Server credentials can be found in Credentials.
- Connect to the Amazon EC2 instance you created earlier, as documented in Connect to your Windows instance.
- Launch SQL Server Management Studio (SSMS) and connect to your SQL Server 2022 instance with a login with sysadmin privileges on SQL Server.
- Open a new query and run the following T-SQL commands to create the credentials.
If you created Amazon S3 buckets with names different from those used in this walkthrough, replace the name sql-backups-2022 in the T-SQL command below with the names you selected, the credential name and the URL used must match.
The SECRET parameter is made up of your Access Key ID and Secret Access key separated by a colon. Please note that if either of the values of Access Key ID or Secret Access key contains a colon, the access key will need to be removed and regenerated, ensuring no colon character is present.
CREATE CREDENTIAL [s3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups]
WITH IDENTITY = 'S3 Access Key'
, SECRET = 'AAAAAAAAAAAAAAAAAAAA:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb';
CREATE CREDENTIAL [s3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups]
WITH IDENTITY = 'S3 Access Key'
, SECRET = 'AAAAAAAAAAAAAAAAAAAA:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb';
Backup a database to Amazon S3
The arguments used in the T-SQL BACKUP DATABASE command will be:
- TO URL = ‘xxx’
This provides the complete path to the bucket and backup file name.
- MIRROR TO URL = ‘xxx’
This enterprise-only feature allows for a second set of backup files to be created in a second bucket.
- WITH FORMAT
This allows existing files in your bucket to be overwritten. Without it, the command will fail if a file already exists. AWS recommends using Amazon S3 versioning to protect files from being accidentally deleted. Removing the argument and providing error handling logic for the failure could also be a viable approach.
- COMPRESSION
Amazon S3 does not provide any compression, so enabling SQL Server Compression during the creation of the backup file and sending the smallest possible file to the Amazon S3 bucket is a cost savings on storage.
- MAXTRANSFERSIZE = 20971520 (20mb)
The MAXTRANSFERSIZE argument defines the size of the largest backup file in bytes. This comprises 10,000 parts a single file can be broken into. Each part is defined in size by the max transfer size. Striping the backup across multiple files by up to 64 separate URLs provides for a total size of 12.2TB. The formula for calculating maximum backup size appears is:
{ MaxTransferSize } * { # of Parts } * { # of URLs }
Using MAXTRANSFERSIZE = 20971520 and striping the backup across up to 64 URLs provides the following result for the maximum backup size:
{ 20971520 } * { 10000 } * { 64 } = 13,421,772,800,000 B =~ 12.21 TB
The final backup command will be where db1 is the database and /backups/db1.bak defines a folder structure named “backups” to house the backup file. Amazon S3 is a flat structure but supports the folder concept to assist in the logical grouping of files. More details are provided in Organizing objects in the Amazon S3 console using folders.
If you created Amazon S3 buckets with names different from those used in this walkthrough, replace the name sql-backups-2022 in the T-SQL command below with the names you selected.
BACKUP DATABASE db1
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1.bak'
WITH FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
Executing this command in SSMS will produce an output showing the completed backup details.
Processed 536 pages for database 'db1', file 'db1' on file 1.
Processed 2 pages for database 'db1', file 'db1_log' on file 1.
BACKUP DATABASE successfully processed 538 pages in 0.903 seconds (4.650 MB/sec).
Completion time: 2022-09-08T13:55:48.1756888-05:00
To improve performance, stripe the backup across multiple files by referencing multiple URLs. There are 5 URLs in the following example, but a maximum of 64 can be provided:
If you created Amazon S3 buckets with names different from those used in this walkthrough, replace the name sql-backups-2022 in the T-SQL command below with the names you selected.
BACKUP DATABASE db1
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part1.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part2.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part3.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part4.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part5.bak'
WITH FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
You can also send the backup to a second Amazon S3 bucket by leveraging the MIRROR TO URL argument:
BACKUP DATABASE db1
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part1.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part2.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part3.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part4.bak',
URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-part5.bak'
MIRROR TO URL = 's3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups/db1-part1.bak',
URL = 's3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups/db1-part2.bak',
URL = 's3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups/db1-part3.bak',
URL = 's3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups/db1-part4.bak',
URL = 's3://sql-backups-2022-ohio.s3.us-east-2.amazonaws.com/backups/db1-part5.bak'
WITH FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
Areas to note in the MIRROR TO URL argument:
- The number of URLs passed in the second argument must match the number of URLs used in the first argument. The 64 URLs limit is cumulative, meaning that, in the above example, you would be limited to 32 URLs in each argument. This halves the maximum backup size that can be taken.
- Up to three MIRROR TO URL clauses can be used reducing the maximum number of URLs to 16 and a total backup size of ~3TB. Consider using Amazon S3 object replication instead of the MIRROR TO URL argument if the full 12TB is needed.
- If the file exists in one URL path but not in the mirror URL path, the backup will fail regardless of the WITH FORMAT argument.
- This is an enterprise feature.
There is also a behavior change to note here between backing up to Microsoft Azure Blob Storage and Amazon S3. When your URL is pointing at Azure Blob Storage, you can back up to DISK and URL in the same command, whereas when pointing your URL at Amazon S3, you can back up to only Amazon S3 in the primary and mirror targets. Mixing backup targets will cause the following error:
Msg 3298, Level 16, State 5, Line 183
Backup/Restore to URL device error: Backup to S3 remote storage cannot be mixed with other backup device types. Ensure all Primary and Mirror Urls use S3 path.
The full, differential and transaction log options are available and can be used with the COPY_ONLY argument to allow for out of chain backups:
If you created Amazon S3 buckets with names different from those used in this walkthrough, replace the name sql-backups-2022 in the T-SQL command below with the names you selected.
BACKUP DATABASE db1
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1.bak'
WITH COPY_ONLY, FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
BACKUP DATABASE db1
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1-diff.bak'
WITH COPY_ONLY, DIFFERENTIAL ,FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
BACKUP LOG [db1]
TO URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1.log'
WITH COPY_ONLY, FORMAT, COMPRESSION, MAXTRANSFERSIZE = 20971520;
Restore a database from Amazon S3
You can restore from URL either to replace the current database or to restore as a new database using the standard formatting:
If you created Amazon S3 buckets with names different from those used in this walkthrough, replace the name sql-backups-2022 in the T-SQL command below with the names you selected.
RESTORE DATABASE db1
FROM URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1.bak'
WITH REPLACE;
RESTORE DATABASE [db1-copy]
FROM URL = 's3://sql-backups-2022.s3.us-east-1.amazonaws.com/backups/db1.bak'
WITH FILE = 1,
MOVE N'db1' TO N'C:\SQL2022\Data\db1-copy.mdf',
MOVE N'db1_log' TO N'C:\SQL2022\Log\db1-copy_log.ldf'
Costs considerations
There are two primary components of the cost of the solution: the actual storage cost and the data transfer cost.
Amazon S3 storage tier
The default Standard tier is not the most cost-effective choice for most SQL Server backup files, as customers will typically only access them in a data recovery effort. There are several options for alternate storage tiers based on expected retention periods and access. The details of each tier is documented in Amazon S3 storage classes, but for a general recommendation where retention and access patterns are mixed or unknown, AWS recommends that you use the Amazon S3 Intelligent-Tiering storage class.
By default, the files written to the bucket will use the Standard tier. To convert the backup files from the Standard tier to Intelligent tiering, you will need to create a lifecycle rule, as documented in Transitioning objects using Amazon S3 lifecycle.
- Provide a lifecycle rule name such as “Convert to Intelligent Tiering”
- For the rule scope select Apply to all objects in the bucket and check I acknowledge that this rule will apply to all objects in the bucket
- Lifecycle rule actions will be Move current versions of objects between storage classes and Move noncurrent versions of objects between storage classes
- Transition current versions should be set to Intelligent-Tiering and zero days after object creation
- Transition noncurrent versions should be set to Intelligent-Tiering and zero days after objects become noncurrent
Figure 3 shows how to set up a tiering option.
Figure 3. Lifecycle rule configuration
Data transfer costs
Writing data (ingress) to your Amazon S3 bucket from either your on-premises SQL Server instance or your Amazon EC2 SQL Server instance in the same region as your bucket is always free.
Reading data (egress) from your Amazon S3 bucket incurs costs for actions, such as listing the objects stored in the bucket or downloading a file from Amazon S3 to a local drive. Read more about egress pricing in Amazon S3 pricing.
There are also costs to consider when using the AWS Command Line Interface (AWS CLI) to list out the objects within your bucket. A more cost-effective approach is to use your database backup history within the msdb database to determine Amazon S3 contents and build restore commands, shown in the following sample TSQL script:
SELECT database_name, backup_finish_date, 'RESTORE DATABASE [' + database_name + '] FROM URL = ' + physical_device_name + ' WITH REPLACE;' AS tsql
FROM [msdb].[dbo].[backupmediafamily] bmf
INNER JOIN [msdb].[dbo].[backupset] bs ON bmf.media_set_id = bs.media_set_id
ORDER BY database_name, backup_finish_date DESC
When considering a replication strategy for disaster recovery or other scenarios, the replication of the contents of a bucket to a second region using Cross-Region Replication incurs cross-region data egress costs that differ based on the source and target regions. Using the MIRROR TO URL argument will avoid these egress costs, but will do so at the cost of smaller maximum backup sizes, potentially longer backup times, and higher utilization of your internet connection.
Performance
When testing the maximum number of URLs (64) in a backup command with a 20mb MAXTRANSFERSIZE for a single 144 GB database backup across 64 files, the throughput shown in Figure 4 peaked around the 16x EC2 size at approximately 2403 MB/s with an EC2 networking capacity of 37.5 Gbps. Adding additional network capacity via larger Amazon EC2 instances did not improve performance highlighting that the read performance of the source databases underlying storage will become the limiting factor eventually.
Creating parallel database backup operations has the capability of providing similar performance for each thread to the Amazon S3 bucket, but you will need to test for your workloads as storage and compute limits can also become bottlenecks. The correct number of files will need to be determined for each workload but should be large enough to accommodate the overall size of the database backup being created.
Figure 4. SQL Server Backup to Amazon S3
Clean up resources
When you’re finished using the resources in this blog post, clean up the AWS resources to avoid incurring unwanted charges. Specifically, delete the elements created as part of the walkthrough:
- Amazon S3 bucket – For instructions, refer to Deleting a bucket.
- EC2 instance – For instructions, refer to How do I delete or terminate my Amazon EC2 resources?
- IAM policy created – For instructions, refer to Deleting IAM policies.
- IAM user created – For instructions, refer to Deleting an IAM user.
Conclusion
This blog post highlighted the new capabilities of SQL Server 2022 with Amazon S3 and the considerations of security, performance, and cost.
By leveraging an IAM policy that provides the minimum required permissions and Amazon S3 Intelligent-Tiering to deliver a cost-effective automated use of the Amazon S3 storage tiers, you can ensure a secure and cost-effective backup strategy with the SQL Server 2022 Backup to URL feature.
AWS can help you assess how your company can get the most out of cloud. Join the millions of AWS customers that trust us to migrate and modernize their most important applications in the cloud. To learn more on modernizing Windows Server or SQL Server, visit Windows on AWS. Contact us to start your modernization journey today.