AWS Storage Blog
Migrate data from Google Drive to Amazon S3 using Rclone
Whether you choose to operate entirely on AWS or in multicloud and hybrid environments, one of the primary reasons to adopt AWS is the broad choice of services we offer, enabling you to innovate, build, deploy, and monitor your workloads.
Amazon S3 is a great option for Google Drive users seeking a comprehensive storage solution. Amazon S3 offers very high durability and availability, virtually unlimited scalability, and performant, cost-effective storage options. With its pay-as-you-go pricing model and storage classes optimized for various access patterns and cost requirements, Amazon S3 caters to diverse needs, from managing mission-critical data to storing data for backup and archive. To migrate data from Google Drive to S3, you can use rclone, an open-source command-line tool that provides a streamlined solution for transferring data.
In this post, we demonstrate how you can use rclone to move data from Google Drive to Amazon S3. We walk you through the process of setting up an Amazon Elastic Compute Cloud (EC2) instance with rclone installed and configured to transfer data from Google Drive to Amazon S3. The majority of this setup process is automated through AWS CloudFormation. We also explore different rclone flags to reduce data transfer time while addressing service quotas. Rclone simplifies the migration process with its native support for both storage systems, enabling synchronization and efficient data transfer. Its customizable configuration options, such as controlling concurrent transfers and setting transaction rate limits, help optimize the transfer process.
Solution overview
The following diagram illustrates the architecture for transferring data from Google Drive to Amazon S3 using an EC2 instance with rclone installed. The EC2 instance, which is provisioned through CloudFormation, acts as an intermediary to facilitate the data transfer process. Rclone, running on the EC2 instance, copies data from Google Drive to Amazon S3.
Figure 1: Architecture diagram to transfer data from Google Drive to Amazon S3 using Rclone
By using CloudFormation, the deployment and configuration of the EC2 instance and rclone setup are automated. This makes sure of a consistent and reproducible environment while minimizing manual error, saving time, and reducing effort. Rclone, configured on the EC2 instance, connects to both Google Drive and Amazon S3, thus enabling efficient data transfer from Google Drive to S3.
Prerequisites
Following are the prerequisites you will require to implement this solution:
- Make sure you have necessary permissions to access AWS Identity and Access Management (IAM), AWS Secrets Manager, and Amazon EC2.
- You should have Amazon Virtual Private Cloud (Amazon VPC), a subnet for your VPC, and an EC2 key pair. These are necessary as user input selections when you set up the CloudFormation stack.
- Decide between the x86 and AWS Graviton-based instances (ARM), and choose the EC2 instance type that’s offered in your subnet’s Availability Zone (AZ). Refer to the user guide for finding an EC2 instance type to filter by architecture and AZ. For best price-performance, we recommend the ARM instance unless you have a specific requirement for an x86.
- Rclone suggests making your own Google Drive API Client ID instead of using the default one. Refer to the rclone documentation for creating your own Google Drive Client ID and to learn why it’s recommended.
- You need an S3 bucket as a destination data store.
Review the CloudFormation template to understand IAM user permissions and adjust as necessary. Refer to IAM policies best practices for more details. Similarly, check and update security groups for the EC2 instance if needed.
The CloudFormation automates the rclone setup for Amazon S3, but Google Drive needs manual token authorization after connecting to the EC2 instance. We cover this later in the post.
Note that the sample solution in this post is designed to work with in the US East (N. Virginia) Region us-east-1
. If you’d like to deploy this solution in a different AWS Region, refer to the instructions given in the “Deploying in a different AWS Region” section of this blog.
Walkthrough
Transferring data from Google Drive to Amazon S3 in this post involves:
1. Deploying the CloudFormation template, which provisions an EC2 instance, installs rclone, and configures rclone remote connections.
2. Completing Google Drive token authorization after connecting to the EC2 instance.
3. Using rclone commands to transfer data from Google Drive to Amazon S3.
In the following sections we go through these steps in more detail.
1. CloudFormation stack deployment
This section will walk you through deploying the CloudFormation template to create necessary resources for data transfer.
1.1. Create stack
1.1.1. Download the CloudFormation template, Final-CFT-GDrive-S3-Review.yaml, designed for this blog solution here and then visit the CloudFormation console.
1.1.2. On the Stacks page, choose Create stack at top right, and then choose With new resources (standard).
1.1.3. On the Create stackpage, select Choose an existing template in prepare template section. In Specify template section, select Upload a template file > Choose file to select a CloudFormation template, Final-CFT-GDrive-S3-Review.yaml, you downloaded earlier.
1.1.4. Select Next.
1.2. Specify stack details
Figure 2: CloudFormation stack configuration
1.2.1. Provide a unique Stack name, as shown in the preceding figure. For example, “Rclone-GDrive-to-S3.”
1.2.2. Select the VPC and Subnet in which to create your EC2 instance.
1.2.3. Enter your preferred Instance Type. Follow the instructions given in the Prerequisites section for details on selecting the compatible instance type.
1.2.4. Select the EC2 key pair.
1.2.5. Enter into the Your IP Address range for where you want to allow inbound traffic to your instance.
1.2.6. Enter the Client ID and Client secret that you created from the Google Drive API Console in the previous steps.
1.3. Configure stack options
You can leave the options as default and select Next.
1.4. Review and create
You can select the check mark at the bottom of the Capabilities section. Then choose to agree to the acknowledgement of creating IAM resources and select Submit.
2. Google Drive authorization token
This section guides you through the process of authorizing rclone to access your Google Drive account.
2.1. Connecting to Google Drive Remote
After completing the CloudFormation deployment, connect to your EC2 instance: “Rclone Instance – GDrive to S3.”
The first step after connecting to the EC2 instance is to authorize rclone for Google Drive access. This is an additional security step along with providing Google Drive client_id
and client_secret
for rclone access.
Run the following command and enter n
because you are working on a remote or headless machine. We are selecting this option because you don’t have a browser in your Ubuntu-based EC2 instance.
rclone config reconnect gdrive-remote:
Figure 3: Configuring rclone to connect to Google Drive remote
After entering n
, as shown in the following image, rclone provides a link to get the token from Google for rclone. Copy the link and paste it in a browser. Make sure you are logged into the Google account associated with the Google Drive you want to access.
Figure 4: Accessing the link given by rclone to authenticate with Google Drive
2.2. Authorization with Google Drive
After following a series of prompts from Google (Choose an account > rclone wants to access your Google Account > Authorization code), you should see a screen that looks like the following image, which contains the authorization code.
Figure 5: Authorization code from Google
Copy the authorization code, go back to the rclone EC2 instance, and paste it in the Enter verification code>
field.
Rclone provides a prompt asking if you want to Configure this as a team drive
, as shown in the following image. Enter your choice that is appropriate to your use case.
Figure 6: Providing the authorization code to rclone from Google
2.3. Rclone remote connections configuration
When provisioning the EC2 instance with the CloudFormation template, an initial rclone config file was created for the Amazon S3 and Google Drive configurations. Running the following code shows this initial configuration file, as shown in the following image.
nano /home/ubuntu/.config/rclone/rclone.conf
Figure 7: Rclone configuration file
If you’d like to create or update configurations according to your requirements, then you can directly edit this configuration file by going to the file path. Alternatively, you can enter an interactive configuration session after connecting to your instance by running the rclone config
command and then editing or updating from there, as shown in the following image:
Figure 8: Editing Rclone configuration
Refer to the Amazon S3 rclone configuration and Google Drive rclone configuration for detailed instructions on customizing remotes to meet your needs.
3. Transferring and managing files between Google Drive and Amazon S3
In this section, you will see how to use rclone to transfer data from Google Drive to Amazon S3 and perform various file management operations.
For this post, Google Drive and Amazon S3 remote connections are named gdrive-remote
and s3-remote
. You can run the following command to list the objects in the specified path.
rclone ls <remote>:<folder_name>/<subfolder_name>
rclone ls <remote>:<bucket_name>/<folder_name>
Figure 9: Listing objects within specified paths from remotes using rclone list command
Start by copying files from one remote to another. The rclone copy command copies files from source to destination. For test purposes, consider transferring around 1 TB of data from Google Drive to an S3 bucket.
The direct command to copy is as follows:
rclone copy <source>:<sourcepath> <dest>:<destpath>
However, you need certain conditions in place to address the quotas from the storage service providers while also optimizing transfer speed. Rclone flags add extra functionality to Rclone commands, enabling you to manage data across remotes more efficiently. By using the appropriate flags, you can have a balance between adhering to the quotas and minimizing the time needed for data transfers.
Now you can start with the code, understand how these flags work, and finally implement them to transfer data.
rclone copy \ --tpslimit 200 \ --transfers 200 \ --buffer-size 200M \ --checkers 400 \ --s3-upload-cutoff 100M \ --s3-chunk-size 100M \ --s3-upload-concurrency 50 \ gdrive-remote: \ s3-remote: EXAMPLE-DESTINATION-BUCKET \ -P
The -P/--progress
flag helps you view real-time transfer statistics during file operations. The following flags help with different things:
--tpslimit
helps you specify or limit the number of transactions per second (TPS). A transaction or a query in this case can be a PUT/GET/ POST if it’s an HTTP backend.
--transfers
controls the number of simultaneous file transfers. By default, rclone performs four parallel transfers.
--buffer-size=SIZE
specifies the buffer size for each transfer to improve transfer speed. Each --transfer
uses the specified amount of memory for buffering.
--checkers=N
controls the number of parallel file checkers during operations, such as copying files. Checkers verify file integrity and make sure of correct transfers.
Rclone supports Amazon S3 multipart upload. Multipart upload enables the uploading of a single object as multiple parts, each representing a continuous portion of the object’s data. It’s recommended to use multipart uploads instead of single operations when the object size exceeds 100 MB. Refer to the multipart uploads with rclone documentation for more information.
Google Drive quotas
As Google Drive API is a shared service with multiple users, Google sets certain quotas, such as the number of transactions/ queries per a defined time.
You can find more info in the documentation on usage limits from Google Drive.
To check your quotas, refer to the Google cloud view and manage quotas documentation, or you can follow these steps:
1. Visit the Google API Console.
2. Select your project. In the Enabled APIs & Services section, select Google Drive API.
3. Under QUOTAS & SYSTEM LIMITS, you can see properties such as “Queries per minute” and “Queries per minute per user.”
In our case, we have “Queries per minute” and “Queries per minute per user” at 12,000, which roughly translates to 200 per second.
Amazon S3 quotas
On the other hand, Amazon S3 allows significantly higher requests. It allows at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix, with no limit on the number of prefixes per bucket.
During scaling, temporary 503 errors (slow down) may occur, but they subside once scaling completes. Refer to the best practices design patterns for optimizing Amazon S3 performance.
To make sure that you stay within the quota limits, consider values from both remotes and set the --tpslimit
to the lower of the two. Since Google Drive allows only 200 queries per second in this case, set the flag as --tpslimit 200
.
Figure 10.1: Rclone copy command with the flags to copy the data from Google Drive to the S3 bucket
Now you can implement the preceding commands. The following image shows that transferring 1001.06 GB of data from Google Drive to Amazon S3 took approximately 10 minutes and 15.7 seconds.
Figure 10.2: Output of the copy command showing the details about the transfer
check: Verifies that the files in the source and destination are identical.
rclone check gdrive-remote: EXAMPLE-DESTINATION-BUCKET -P
Figure 11. Rclone check command
delete: Deletes files from the given path.
rclone delete gdrive-remote:gdrive-test-folder-2 -P
Figure 12. Rclone delete command
Refer to the rclone commands documentation for a complete list of available commands, such as rclone sync
if you’d like to sync the source with the destination.
Monitoring EC2 instance
Refer to the rclone documentation to learn more about these flags and when to adjust their values. The different flags discussed in this post primarily use the network bandwidth and memory of the EC2 instance. We recommend experimenting with different flag values while monitoring your instance’s performance to achieve optimal results without exceeding the throttle limits of your EC2 instance.
For example, if you run the preceding command to test transferring 1 TB of data, and then you noticed that your EC2 instance’s CPU usage is 30%, memory usage is 60%, and network usage is 70%. Ideally you would want to stay under 100%, for example a safer value of around 90%. Therefore, these metrics indicate that there is room for growth to increase the values for flags, such as --transfers
to enhance the transfer process.
You can monitor your EC2 instance using Amazon CloudWatch, an application performance monitoring service.
Other ways to optimize the data transfer process
The following methods can help you further improve data transfer process:
1. Higher performance compute: A direct option is to select an EC2 instance type with higher performance. Try to have a balance among the following factors: Amazon EC2 usage time, instance cost, and data transfer completion time.
2. TPS quotas: One way to address TPS quotas is that you could request a quota increase from Google.
3. Other relevant rclone flags: You can look into the following flags and evaluate if they are relevant to your preferences. Note that a few of these flags at times may cross the quota values such as TPS, so consider them with caution:
Deploying in a different AWS Region
Modify the solution as follows to deploy the solution in an AWS Region other than N. Virginia:
- Replace the
ImageId
value in the CloudFormation template with the Amazon Machine Image (AMI) ID of “Ubuntu Server 22.04 LTS (HVM), SSD Volume Type” for your preferred architecture and AWS Region. Refer to the finding a Linux AMI documentation to look for these details. - Replace the
region
value in the CloudFormation template with your desired AWS Region code.
Cleaning up
You may want to delete the resources created in this post to avoid unwanted future charges. To delete the stack resources, you can delete the CloudFormation stack. In addition, visit the Google Cloud Console credentials page and delete the OAuth credentials that you created earlier.
Conclusion
In this post, we explored how to efficiently transfer large amounts of data from Google Drive to Amazon S3 using Rclone. We automated the setup process of EC2 instance creation, Rclone installation, and remote connections configuration through AWS CloudFormation, reducing manual effort and potential errors. We also explored various Rclone flags that optimize transfer times while staying within service quotas, helping you avoid throttling and delays. This approach allows you to customize the data transfer process to match your specific requirements, ensuring efficient and reliable migrations even for large datasets.
By moving your data to Amazon S3, you can take advantage of cost savings, and its scalability and performance make it a great choice for your data lakes, analytical, and ML/AI applications. Additionally, Amazon S3 provides extensive security features and supports numerous compliance regulations, enhancing data protection and governance.
Feel free to check out other posts that might be helpful in migrating data and monitoring:
- Migrating Google Cloud Storage to Amazon S3 using AWS DataSync
- Setup memory metrics for Amazon EC2 instances using AWS Systems Manager
Thank you for reading this post. If you have any comments, feel free to post them in the comments section.