AWS Storage Blog
Migrating Oracle Cloud Infrastructure Object Storage to Amazon S3 using AWS DataSync
Users face challenges in their digital transformation journey involving the migration of data across various platforms, on-premises file systems, and other cloud services. When using other cloud providers, scenarios arise where the seamless transfer of data becomes essential. Whether executing a one-time data transfer or integrating it into a scheduled workflow, minimizing business downtime is crucial. Users prioritize secure data transfer methods and prefer leveraging native approaches to address these tasks.
Organizations migrate data to Amazon Simple Storage Service (Amazon S3) from other cloud providers for several reasons, such as data consolidation, data lake formation, centralized log management, disaster recovery (DR), and cost optimization. The process involves moving data from Oracle Cloud Infrastructure (OCI) object storage and transferring data into Amazon S3 for business applications to use.
Object storage is a fundamental storage service offered by cloud providers, such as AWS, OCI, and others. With more users moving data and services to the cloud, the need to transfer data from one cloud provider to another is becoming a more frequent occurrence.
In this post, we demonstrate how to migrate object storage data from OCI Object Storage to Amazon S3 using AWS DataSync, a fully managed data transfer service that simplifies moving data between OCI Object Storage and Amazon S3.
Solution overview
DataSync makes data more accessible by moving it to and from AWS storage, on-premises file systems, and other cloud storage services. DataSync is used to migrate Amazon S3 compatible object storage data between other cloud providers and AWS storage services. To implement this solution, a DataSync agent is deployed and configured in AWS. The DataSync agent is deployed as an Amazon Elastic Compute Cloud (Amazon EC2) instance that communicates with OCI Object Storage through an Amazon S3 compatible API endpoint.
For security purposes during the transfer process, you need to create appropriate IAM roles, policies and security groups in both OCI and AWS. These roles and groups are used to grant permissions to access the data and resources needed for the migration.
The high-level DataSync architecture represents a DataSync agent that transfers objects from a source OCI Object Storage bucket to the destination S3 bucket. To prevent outbound traffic from the agent to the DataSync service, it is recommended to activate the agent with a DataSync VPC endpoint. To learn more about how the DataSync moves data between storage systems, refer to the documentation: How DataSync transfers works.
Figure 1. Architecture diagram for migrating data from OCI Object Storage to Amazon S3 using DataSync
In this post, we go through the following steps:
- Create a secret key in OCI.
- Create an S3 bucket as the destination.
- Configure a network for the Amazon Virtual Private Cloud (Amazon VPC) endpoint.
- Deploy a DataSync agent as an EC2 instance.
- Configure OCI Object Storage as the source and Amazon S3 as the destination DataSync location.
- Create, configure, and start the DataSync task.
- Verify the data transferred.
Prerequisites
The following prerequisites are necessary to follow along with this post:
- An AWS account with AWS Management Console access to DataSync, Amazon S3, and Amazon CloudWatch.
- Access to AWS Command Line Interface (AWS CLI).
- Access to OCI Object Storage with source objects to transfer.
Source OCI Object Storage setup
This section illustrates the OCI Object Storage bucket in the US-Ashburn-1 region. Refer to the OCI Object Storage Service API documentation for available AWS Regions.
Figure 2. OCI Object Storage bucket
The source OCI Object Storage bucket contains three folders: Source Folder 1, Source Folder 2, and Source Folder 3. Each folder contains three text files that are transferred to Amazon S3.
Figure 3. OCI Object Storage folders inside bucket
Steps for migration
Step 1: Create a secret key in OCI
- Open the Profile menu and select User Settings or your account name.
- Under Resources, select Customer Secret Keys.
- Enter a description for the key and select Generate Secret Key. OCI generates an Access and Secret Key pair.
- Copy the secret key immediately, for security reasons you won’t be able to see it again after you close the dialog box for security reasons.
Note: Secret keys are sensitive information. Make sure to keep them safe and only share them with authorized users. Refer to OCI documentation for additional details on secret keys.
Figure 4. Create OCI Customer secret key
Step 2: Create an S3 bucket as the destination
Create an S3 bucket and copy its Amazon Resource Name (ARN) from the Properties tab.
Figure 5. Create S3 bucket
Step 3: Configure a network for the Amazon VPC endpoint
To prepare your network for using Amazon VPC endpoints, set up the VPC, subnet, route table, and security group according to the network requirements. Next, create a DataSync VPC interface endpoint so the connection between the DataSync service and the agent does not go through the public internet. The endpoint type is a DataSync VPC endpoint (for example DataSync service name: com.amazonaws.us-east-2.datasync).
Figure 6. Create Amazon VPC endpoint
Step 4: Deploy a DataSync agent on an EC2 instance
To deploy a DataSync agent as an EC2 instance:
- Launch the EC2 instance with the latest DataSync AMI in the subnet from Step 3.
- Assign the security group for agents to the EC2 instance.
- Activate the DataSync agent to associate it with your AWS account.
Figure 7. Create DataSync agent
Step 5: Configure OCI Object Storage as source and Amazon S3 as destination DataSync location
1. Configure OCI Object Storage as source DataSync location.
a. Open the DataSync console, expand Data transfer, then choose Locations and Create location.
b. For Location type, choose Object Storage.
c. Select the agent created in Step 4.
d. In Server, enter the following endpoint: <object_storage_namespace>.compat. objectstorage.<region>.oraclecloud.com
.
Note: replace <object_storage_namespace>
with your OCI Object Storage namespace and <region> with your OCI region.
e. For Bucket name, put the name of the OCI Object Storage bucket name.
f. Under Authentication, add the OCI access and secret key created in Step 1.
g. Choose Create location.
Figure 8. Create source location
2. Configure Amazon S3 as the destination DataSync location
a. Open the DataSync console, expand Data transfer, then select Locations and Create location.
b. For Location type, choose Amazon S3.
c. In S3 bucket, select the bucket created in Step 2 to use as a destination location.
d. For S3 storage class, choose a storage class that you want your objects to use when Amazon S3 is a transfer destination.
e. In IAM role, choose Autogenerate for DataSync to automatically create an IAM role with the permissions needed to access the S3 bucket or create a custom IAM role. See Creating an IAM role for DataSync to access your S3 bucket.
f. Choose Create location.
Figure 9. Create destination location
Step 6: Create, configure, and start the DataSync task
1. Create a DataSync task.
a. Open the DataSync console, choose Task, and select Create task.
b. Configure the source location. For Source location options, select Choose an existing location. For Existing Locations, choose the OCI Object Storage location previously created in Step 5.
c. Configure destination location: For Destination location options, select Choose an existing location. For Existing Locations, select the S3 bucket previously created in Step 5.
Figure 11. Configure destination location
2. Configure Settings and start the DataSync task
a. Give a task name and incorporate the desired task execution and data transfer configuration as per your requirements.
Figure 12. Task execution configuration
-
b. OCI Object Storage does not support tagging at the object level. You must deselect Copy object tags under Data transfer configuration. Additional settings are as shown in the following figure. The DataSync task may fail if you attempt to copy object tags that do not exist in OCI Object Storage.
Figure 13. Data Transfer configuration
-
c. Enable Logging with an auto-generated CloudWatch log group as shown in the following figure.
Figure 14. Logging configuration
-
d. Choose Create task, start with defaults, and verify the Task Status.
Figure 15. Task Status
Step 7: Verify the data transferred
After the successful completion of the DataSync task, compare the data in the destination S3 bucket with the source OCI Object Storage data to verify the transfer completed successfully. The source OCI Object Storage data is now successfully migrated to Amazon S3, as shown in the following figure.
With CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check previous data transfer history. If you encounter any failure during the transfer process, then you can review logs in CloudWatch to investigate and find the root cause of the issue.
Figure 16. Migrated Amazon S3 data.
Cleaning up
To avoid incurring charges, delete any unused resources discussed in this post:
- Delete the OCI Object Storage contents and bucket.
- Delete DataSync task, source and destination locations, and agent.
- Shut down and terminate the EC2 instance.
- Delete the VPC interface endpoint.
- Delete the S3 bucket.
Conclusion
When deciding which data to migrate, you must consider object storage performance needs, resilience requirements, and cost benefits.
In this post, we discussed how to migrate object storage data from OCI Object Storage to Amazon S3 using DataSync. This solution provides a simple and efficient way to migrate object storage data securely and quickly. By following the steps outlined in this post, you can migrate your object storage data from OCI Object Storage to Amazon S3 to your applications to use.
DataSync provides several features, such as bandwidth optimization and control, data encryption and validation, data integrity check, and task scheduling. These features make it simple to move data to and from AWS by taking the hassle out of data movement and integrating with multiple cloud providers. To learn more about how to use this cost-effective service, see the DataSync documentation and DataSync pricing page.