AWS Storage Blog

Transferring data in Amazon S3 between AWS GovCloud (US) Regions and commercial AWS Regions using AWS DataSync

AWS users who need to comply with the most stringent US government security and compliance requirements operate their workloads in AWS GovCloud (US), which is architected as a separate partition providing network and identity isolation.

A common use case for AWS GovCloud (US) users is to operate in both AWS GovCloud (US) Regions and commercial AWS Regions, such as US East (Northern Virginia). Operating in both partitions of the cloud may require moving data and artifacts between them. Moving data between commercial AWS Regions and GovCloud Regions can be challenging for users due to the network and identity isolation between these Regions. The process of moving data should also provide the following capabilities:

  • Data encryption in transit
  • Data validation
  • Monitoring and auditing
  • Data transfer scheduling

AWS DataSync is an online data movement service that simplifies and accelerates data migrations to AWS. It also moves data to and from on-premises storage, connected edge locations, other cloud providers, and between AWS Storage services.

In this post, I explain how to copy data between an Amazon S3 bucket outside AWS GovCloud and an S3 bucket in the AWS GovCloud Region using an AWS DataSync task run from the commercial AWS Region. You can use the same approach to run DataSync tasks in the AWS GovCloud region to copy data between buckets in AWS GovCloud (US) and commercial AWS Regions.

Solution overview

The solution leverages DataSync and a DataSync agent deployed as an Amazon Elastic Compute Cloud (Amazon EC2) instance. The architecture of DataSync for this solution is illustrated in the following figure 1.

DataSync architecture for transferring data between non-AWS GovCloud Regions and AWS GovCloud

Figure 1: DataSync architecture for transferring data between non-AWS GovCloud Regions and AWS GovCloud

An S3 bucket in the standard (commercial) partition is configured as an S3 DataSync location. However, the S3 GovCloud bucket is configured as a DataSync object storage location using a DataSync agent. The DataSync agent as well as a DataSync task are created and run in the commercial AWS Region.

I recommend activating the DataSync agent with a VPC endpoint. This keeps the traffic between the agent and DataSync within a private network (cross-Region EC2 data transfer costs still apply). To keep costs in check, consider deploying both the agent and the private endpoint within the same Availability Zone (AZ). You can achieve this alignment by making sure that the DataSync agent and the interface endpoint are both located in the same AZ.

Solution walkthrough

  1. Create an AWS Identity and Access Management (IAM) user with access to an S3 bucket in the AWS GovCloud (US) account.
  2. Configure a network for the Amazon Virtual Private Cloud (Amazon VPC) endpoint.
  3. Deploy a DataSync agent as an EC2 instance in a commercial AWS Region.
  4. Create a source S3 DataSync location outside of AWS GovCloud.
  5. Create a destination object storage DataSync location in AWS GovCloud.
  6. Create a DataSync task outside the AWS GovCloud Region.
  7. Run and monitor the DataSync task.

Prerequisites

The following prerequisites are required to continue with this post:

To deploy the solution, use AWS CloudShell and the AWS Management Console. AWS CloudShell has the AWS CLI and jq preinstalled.

Step 1. Create an AWS IAM user with access to the S3 bucket in AWS GovCloud (US)

To create an IAM user and attach an IAM policy with permissions to an S3 bucket in AWS GovCloud (US), I use AWS CloudShell.

1. Define an environment variable for the bucket name and create a file with the minimum permissions policy for DataSync. The following policy allows DataSync to transfer data to the S3 bucket.

BUCKET="S3BUCKET_NAME"
cat > policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
	"s3:ListBucket",
	"s3:GetBucketLocation",
	"s3:ListBucketMultipartUploads"
	],
      "Resource": ["arn:aws-us-gov:s3:::${BUCKET}/*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:GetObjectTagging",
        "s3:GetObjectVersion",
        "s3:GetObjectVersionTagging",
        "s3:ListMultipartUploadParts",
        "s3:PutObject",
        "s3:PutObjectTagging"
      ],
      "Resource": ["arn:aws-us-gov:s3:::${BUCKET}/*"]
    }
  ]
}
EOF

2. Create the IAM policy.

POLICY_ARN=$(aws iam create-policy \
    --policy-name DataSyncAgentS3Access \
    --policy-document file://policy.json \
    | jq -r .Policy.Arn)

3. Define an environment variable for the username and create an IAM user.

USERNAME="DataSyncAgent"

aws iam create-user --user-name ${USERNAME} 

4. Attach the previously created IAM policy to the user.

aws iam attach-user-policy --user-name ${USERNAME} --policy-arn ${POLICY_ARN}

5. Create a secret access key for the user and save the AccessKeyId and SecretAccessKey securely.

aws iam create-access-key --user-name ${USERNAME}

Step 2. Configure a network for the Amazon VPC endpoint

Set up the VPC, subnet, route table, and security group according to the necessary network requirements for using VPC endpoints. Then create a DataSync interface endpoint to minimize the need for public IP addresses and that the connection between the DataSync service and the agent doesn’t traverse the public internet.

Step 3. Deploy the DataSync agent as an EC2 instance in a commercial Region

After setting up the VPC endpoint, the next step is to deploy an agent as an EC2 instance. Launch the EC2 instance using the latest DataSync Amazon Machine Image (AMI) in the subnet of the DataSync VPC endpoint and assign the security group for the agent. Finally, activate the agent via VPC endpoints using AWS PrivateLink to associate it with your AWS account.

Step 4. Create a source S3 DataSync location

Next, I create a DataSync location for the source s3 bucket in a commercial Region, as shown in the following figure.

1. In the AWS Management Console navigate to DataSyncData transferLocations.

2. Select Create location.

3. Select Amazon S3 as the Location type.

4. Select your S3 bucket.

5. Leave the Amazon S3 storage class as Standard.

6. Enter a folder prefix into Folder if you only want to copy objects under a specific prefix.

7. Select Autogenerate to create the IAM role automatically. DataSync creates a least privileged role for this specific bucket. If you would like to use an existing IAM role, then select the IAM role.

Figure 2. Creating a source DataSync S3 location

Figure 2: Creating a source DataSync S3 location

Step 5. Create a destination object storage DataSync location

Next, I create a DataSync location for a destination S3 bucket in AWS GovCloud (US) configured as a DataSync object storage location.

1. In the AWS Management Console, navigate to DataSyncData transferLocations and select Create Location.

2. Select Object Storage as the Location type and the agent deployed in Step 3.

3. For Server, type s3.<us-gov-region>.amazonaws.com, where <us-gov-region> is an AWS GovCloud (US) Region, where the S3 bucket is located.

4. For Bucket Name enter the name of the destination bucket located in an AWS GovCloud (US) Region.

S3 bucket configured as a DataSync object storage location

Figure 3: S3 bucket configured as a DataSync object storage location

5. Under the Authentication section add the access and secret key created in Step 1.

User authentication key for a DataSync object storage location

Figure 4: User authentication key for a DataSync object storage location

Step 6. Create a DataSync task

1. In the AWS Management Console, navigate to DataSyncData transferTasks and choose Create task.

2. Select Choose an existing location for Source location options, then select the source S3 DataSync location created in Step 4, and then choose Next.

Configure the source location for the DataSync task

Figure 5: Configure the source location for the DataSync task

3. To configure a destination location, select Choose an existing location, choose the object storage location created in Step 5 and select Next.

Configure the destination location for the AWS DataSync task

Figure 6: Configure the destination location for the AWS DataSync task

4. On the next screen provide a name and task options that meet your specific requirements.

Figure 7. Name, Source data, and Transfer options for DataSync task

Figure 7: Name, Source data, and Transfer options for DataSync task

5. Select the required logging level. For this guide, I select Log all transferred objects and files. In case you need to copy millions of objects, I recommend selecting Log basic information such as transfer errors.

6. Autogenerate a CloudWatch log group and resource policy, or select existing ones to use and select Next.

Figure 7. Name, Source data, and Transfer options for DataSync task 5.	Select the required logging level. For this guide, I select Log all transferred objects and files. In case you need to copy millions of objects, I recommend selecting Log basic information such as transfer errors. 6.	Autogenerate a CloudWatch log group and resource policy, or select existing ones to use and select the Next button.

Figure 8: Log-level options for a DataSync task

7. Review your task settings and select Create task.

Step 7. Run and monitor the DataSync task

Once the task is created, start the task.

  1. Choose Start and then select Start with defaults.
  2. Select the History tab and choose the latest execution.
  3. You can monitor the status and performance of the task.
  4. Check CloudWatch log group for errors.

Monitoring the DataSync task

Figure 9: Monitoring the DataSync task

The Performance section allows you to track the amount of data transferred and the number of files transferred, as well as identify if files were skipped during the transfer process. By analyzing the metrics provided in the Performance section, you can identify potential issues with transferring data.

Cleaning up

To avoid incurring future charges, delete the resources created in this tutorial.

  1. Delete the user and policy in the AWS GovCloud (US).
  2. Delete DataSync task, locations, and then agent.
  3. Shut down and terminate the EC2 instance.
  4. Delete the DataSync VPC endpoint.
  5. Delete S3 buckets.

Conclusion

In this post I showed how to transfer data securely between commercial AWS Regions and the AWS GovCloud (US) Regions leveraging AWS DataSync. Some companies using AWS are running workloads across both the AWS GovCloud (US) and the standard partition. Scenarios such as interdependent workloads or the need to combine data from various sources are driving the necessity to transfer data in and out of the AWS GovCloud (US) Regions.

Although we transferred data in one direction from the commercial AWS Region to the AWS GovCloud (US) Region, you can use the same approach for transfers in the other direction, from the AWS GovCloud (US) Region to the commercial AWS Region. In this scenario, a DataSync agent should be deployed as an EC2 instance within the AWS GovCloud (US) account using an AWS GovCloud (US) specific AMI, and an IAM user with access to an S3 bucket should be provisioned in the commercial Region. You may also transfer data into or from other AWS storage services such as Amazon EFS, Amazon FSx between the commercial AWS Region and the AWS GovCloud (US) Region. In the following table, you can see different scenarios of transferring data. The Region where the DataSync agent is deployed is where the DataSync task is run.

Source AWS Region Source AWS Storage service Destination AWS Region Destination AWS Storage Service Region to deploy DataSync agent
Commercial Amazon S3 GovCloud Amazon S3 Commercial/GovCloud
Commercial Amazon S3 GovCloud Amazon EFS/Amazon FSx GovCloud
GovCloud Amazon S3 Commercial Amazon S3 Commercial/GovCloud
GovCloud Amazon S3 Commercial Amazon EFS/Amazon FSx Commercial

Different scenarios of transferring data between commercial AWS Regions and AWS GovCloud (US) Regions

For additional details on configuring an Amazon S3 bucket as DataSync object storage locations, please visit the DataSync documentation.

Dmitry Frizner

Dmitry Frizner

Dmitry Frizner is a Senior Solution Architect with Amazon Web Services with over 20 years of experience in the industry. For the last years he has been focusing on working with independent software vendor customers, helping them design and build innovative SaaS services on AWS. He is passionate about SaaS, AI, and building effective solutions for business. https://www.linkedin.com/in/frizner/