AWS Storage Blog
How Jemena approached data migration using AWS DataSync and shared VPCs
Organizations starting their cloud migration journey must make several design choices about their AWS architecture. Some of these design choices relate to organizational structure, the number of AWS accounts, Virtual Private Cloud (VPC) options, and other details. Depending on these upfront choices, the tooling and approach to migrate data from an on-premises system to AWS storage services can vary.
AWS DataSync customer Jemena delivers energy to one in five Australian homes and businesses. Jemena owns and operates some of the country’s most exciting energy projects, including projects in hydrogen and biomethane, electric vehicles, and solar energy efficiency. In this post, we explore Jemena’s architectural design choices, and how Jemena was able to migrate 47 TB of data from their on-premises file systems to AWS using AWS DataSync in concert with their chosen shared-VPC architecture.
Architecture overview
Shared VPCs offer an elegant way to distribute the VPC boundary across multiple AWS accounts. Network constructs, such as subnets, Network Access Control Lists (NACLs) and VPC endpoints, are created in a central account (owner) and made available to multiple AWS accounts (participants) through a sharing agreement facilitated by AWS Resource Access Manager. Participants can only view, create, modify, and delete application resources (for example, an Amazon EC2 instance) in the subnets shared with them; however, they can’t view, modify, or delete resources that belong to other participants or the VPC owner.
Jemena decided to implement shared VPCs for the following reasons:
- Simplified design with no complexity around inter-VPC connectivity
- Fewer managed VPCs and simplified VPC endpoint policies
- Segregation of duties between network teams and application owners
- Better IPv4 address utilization
- Cost optimization, reduction in hourly transit gateway (TGW) attachment charges, and potential inter-AZ traffic charge
- Centralized VPC endpoint architecture
The following diagram provides a high-level overview of Jemena’s design choices:
The diagram shows two VPCs created in the core-network account and shared with the core-prod and core-nonprod accounts.
AWS account | Description |
core-network | The account that owns the VPC is shared with participant accounts. |
core-prod | The participant account with which the VPC is shared. This is the account where the production application workloads are hosted. |
core-nonprod | The participant account with which the VPC is shared. This is the account where the non-production application workloads are hosted. |
Data-transfer requirements and challenges
As part of their all-in migration to AWS, Jemena needed to transfer 47 TB of data from their on-premises NAS file systems to Amazon FSx for Windows File Server, within the boundaries of their shared VPC architecture. As such, Jemena had the following data migration requirements to AWS:
- Data transfers should use cloud-native services as a preference.
- All communications must remain private and not traverse the public internet.
- Data transfers must preserve file metadata and permissions.
Aligning to their cloud-native service requirement, Jemena opted for AWS DataSync as the data transfer service to migrate their on-premises file systems to Amazon FSx for Windows File Server. The DataSync agent was deployed to their on-premises infrastructure that communicated with the on-premises file systems over the Server Message Block (SMB) as per the following diagram.
AWS DataSync is an online data transfer service that simplifies moving data between storage systems and AWS services. One of the primary use cases for DataSync is moving datasets rapidly over the network into Amazon FSx for Windows File Server.
The next step was establishing a communications path between the on-premises DataSync agent and the DataSync service.
AWS DataSync provides both public and private endpoint options that enable the DataSync agent to communicate with AWS. Jemena was required to ensure that all communications were private and not traverse the public internet, so they used a AWS DataSync VPC endpoint.
When the DataSync agent is activated over a VPC endpoint, the Control plane and the Data plane traffic follow different paths and network ports:
- The DataSync agent uses the VPC endpoint for control plane traffic to the DataSync service
- The DataSync agent creates a private connection to the DataSync service that talks to Amazon FSx for Windows File Server for the data transfer phase.
It should be noted that in a shared VPC architecture, VPC endpoints can only be created within the VPC owner account (i.e., the core-network account in this post). This meant that the endpoint, which the on-premises DataSync agent uses to communicate with the DataSync service for Control plane traffic, can’t be created in the participant account(s) (that is, core-prod and core-nonprod account).
Alternative approaches were explored, such as:
- Relocating the Amazon FSx share to the VPC owner account (core-network). This was unacceptable because the Amazon FSx file server had to be in the participant account(s) (core-prod and core-nonprod alongside the application workload so it could be managed by the application teams.
- Using manual methods like copying files, WinRAR, and Robocopy to facilitate the data transfer. These methods lacked the ability to either retain file metadata and permissions or were non-cloud-native applications. DataSync service met all the requirements for this migration.
Solution
To address this challenge, the approach adopted by Jemena was to create an additional VPC (placeholder VPC in this post) in the participant account (core-prod/core-nonprod). This placeholder VPC was created to satisfy the control plane traffic flow between the on-premises DataSync agent and the DataSync endpoint as shown in the following diagram.
As for the actual data transfer, when configuring a destination location within DataSync, several Elastic Network Interfaces (ENI) are created depending on the type of AWS Service. For Amazon FSx for Windows File Server, DataSync creates four ENIs in the same subnet as the target file system.
The following example implies that four ENIs are created in the private subnet of the shared VPC in the core-nonprod participant account, when configuring the destination location in the DataSync service.
As seen in the previous diagram:
- The DataSync agent is deployed close to the on-premises storage.
- The DataSync VPC endpoint for the DataSync service is created in the placeholder VPC to allow for DataSync control plane traffic. As noted, This VPC endpoint can’t be created in the shared VPC.
- The DataSync ENIs used for data transfers are created in the shared VPC within the participant account(s) where the Amazon FSx for Windows File Server resides (that is, core-prod and core-nonprod).
With this setup, Jemena was able to migrate their data to FSx file shares within the participant account(s).
It should be noted that the placeholder VPC was deleted once the migration was complete. The placeholder VPC should only be deleted after the DataSync agent has been deleted.
Conclusion
In summary, Jemena utilized the AWS DataSync service to migrate their on-premises file systems to Amazon FSx for Windows File Server using a shared VPC architecture in a controlled, secure, and cost-efficient manner.
Organizations on a similar journey, undergoing an all-in migration to AWS and adopting a shared VPC architecture, can leverage this approach for a successful migration using AWS DataSync.
You can learn more about the individual services by clicking on the following resources: