AWS Storage Blog
Automate Amazon S3 File Gateway on Amazon EC2 with Terraform by HashiCorp
Infrastructure as Code (IaC) involves managing IT infrastructure through code and automation tools to reduce manual management prone to errors, slow scaling, and overhead. For organizations implementing a hybrid cloud infrastructure, automation can ensure uniformity, scalability, and cost reduction while getting cloud resources provisioned efficiently. Automated provisioning and configuration enable organizations to adapt, innovate, and stay competitive, promoting consistency and agility in response to market dynamics.
A number of tools and services are available to customers to automate hybrid infrastructure through AWS Storage Gateway, a set of hybrid cloud storage services that provide on-premises access to virtually unlimited cloud storage. AWS CloudFormation is a service that enables users to automate the deployment of resources on AWS. However, many customers have centered their IaC practice on Terraform by Hashicorp to enable a consistent methodology for managing the infrastructure lifecycle of both in-cloud AWS resources and on-premises virtual infrastructure, thereby lowering operational overhead and providing better governance.
In our previous blog, Automate Amazon S3 File Gateway deployments in VMware with Terraform, we walked through using IaC to deploy AWS Storage Gateway using Terraform Cloud, HashiCorp’s managed service offering. In this blog post, we will guide you through the process of provisioning an Amazon Elastic Compute Cloud (Amazon EC2) based Storage Gateway using IaC. We’ll achieve this by utilizing the AWS Storage Gateway module in combination with Terraform open source binary. You can build on the steps provided here to further automate deployment of Storage Gateway using a continuous integration and continuous delivery (CICD) pipeline.
AWS Storage Gateway overview
AWS Storage Gateway is a hybrid cloud storage service that gives your applications on-premises and in-cloud access to virtually unlimited cloud storage. You can deploy Storage Gateway as a virtual machine (VM) within your VMware, Hyper-V, or Linux KVM virtual environment, as an Amazon EC2 instance within your Amazon Virtual Private Cloud (Amazon VPC), or as a pre-configured physical hardware appliance.
The S3 File gateway offers Server Message Block (SMB) or Network File System (NFS) based access to data in Amazon S3 with local caching for customers that have existing applications, tools or process that leverage a file interface. It’s used for on-premises applications and for Amazon EC2 resident applications that need file storage in S3 for object-based workloads.
Customers deploy an S3 File Gateway on Amazon EC2 for the following reasons:
- For copying backups or dumps of databases running on EC2 such as Microsoft SQL Server, Oracle, or SAP ASE.
- In data pipelines use cases in health-care and life sciences, media and entertainment, and other industries to move data from devices to Amazon S3.
- For archiving use cases where you can tier your file data to lower cost storage with Amazon S3 Lifecycle Policies.
Solution overview
We leverage the Terraform AWS Storage Gateway module to provision an EC2 based Storage Gateway on AWS. We provide end to end examples for creating a Storage Gateway virtual machine in a VPC, including activation, creation of an Amazon S3 bucket and creation of NFS file shares.
The Terraform AWS Storage Gateway module contains the S3 File gateway examples for both SMB and NFS deployments on EC2 and VMware. The module will create a number of networking, IAM and security resources that you can use in your deployment.
Solution walkthrough
Using the following steps, we will create an Amazon S3 File Gateway on an EC2 instance that will provide an NFS interface to seamlessly store and access files as objects in Amazon S3.
- Clone the module repository.
- Setup values for the Terraform variables.
- Trigger the deployment.
- Start using the file shares.
Prerequisites
- AWS account with permissions to create resources in AWS Identity and Access Management (IAM)
- IAM or federated user in your AWS account with the permissions to create and administer:
- Amazon S3 bucket
- AWS Storage Gateway
- Amazon CloudWatch Logs
- AWS KMS
- AWS Security Token Service
- Amazon EC2
- Amazon EBS
- Amazon VPC
- AWS IAM role and policies
- See Changing permissions for an IAM user on how to setup IAM permissions and defining permission boundaries.
- Terraform version ≥ v1.2.0
Step 1. Clone the repository
Clone the repository using the git clone command as shown in the following example:
git clone https://github.com/aws-ia/terraform-aws-storagegateway
The following is the directory structure for this repo.
terraform-aws-storagegateway/
- modules/ # AWS Storage Gateway Modules
- aws-sgw # AWS Storage gateway module for activation and management
- ec2-sgw # EC2 Storage gateway
- s3-nfs-share # S3 File Gateway NFS
- s3-smb-share # S3 File Gateway SMB
- vmware-sgw # VMware Storage gateway
- examples/ # Full examples
- s3-nfs-filegateway-ec2 # S3 File Gateway on EC2 - NFS
- s3-smb-filegateway-ec2 # S3 File Gateway on EC2 - SMB
- s3-nfs-filegateway-vmware # S3 File Gateway on VMware - NFS
- s3filegateway-vmware # S3 File Gateway on VMware - SMB
To provision our File Gateway with Terraform, you’ll need at least two modules from the modules
/ sub-directory: aws-sgw/
and ec-sgw/
. Depending on your use case, you’ll choose either the ‘s3-nfs-share’ module for Linux clients using NFS versions 3 and 4.1 or the ‘s3-smb-share’ module for Windows clients using the SMB protocol.
For the rest of this walkthrough, we will assume an NFS use case. We call these three modules from the main.tf
file in the examples/s3-nfs-filegateway-ec2
directory. Therefore, this main.tf file will be our root module. Addcd
into the preceding directory using the command as follows:
cd examples/s3-nfs-filegateway-ec2
Step 2. Set up values for the Terraform variables
First, we will assign appropriate values to the Terraform variables required by each module. The README.md
file for each module provides a description of all required and optional Terraform variables.
Calling the EC2 based Storage Gateway module
The following code snippets from the main.tf file shows the child module blocks and example input variables.
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
// Adjust any variable values below
vpc_id = "vpc-abcdef123456"
subnet_id = "subnet-abcdef123456"
name = "my-storage-gateway"
availability_zone = data.aws_availability_zones.available.names[0]
aws_region = var.aws_region
//Add any other variables here
}
Note: the example main.tf
file creates a new VPC and Subnet to deploy this file gateway. If you have an existing VPC, you can pass in corresponding VPC and Subnet IDs by modifying the vpc_id
and subnet_id
variable assignments for the preceding ec2-sgw module.
To administer the Storage Gateway EC2 instance, connect directly using Secure Shell (SSH), or use Session Manager, a capability of AWS Systems Manager. For Session Manager, review the documentation for Setting up Session Manager and Working with Session Manager. To connect over SSH, create an Amazon EC2 key pair and set the ssh_key_name
variable as the key pair name.
For an example of creating the Amazon EC2 key pair from an existing public key and setting the ssh_key_name
variable, then review examples/s3-nfs-filegateway-ec2/main.tf.
Here, we set an existing public key path for the ssh_public_key_path
variable. To create a public key for Amazon EC2, follow this procedure using ssh-keygen. Finally, ensure that the Security Group attached to Storage Gateway EC2 instance allows SSH traffic.
Calling the AWS File Gateway module
The following code snippet calls the AWS Storage Gateway module for activation once the gateway VM is created.
module "sgw" {
depends_on = [module.ec2_sgw]
source = "aws-ia/storagegateway/aws//modules/aws-sgw"
gateway_name = "my-storage-gateway"
gateway_ip_address = module.ec2_sgw.public_ip
join_smb_domain = false
gateway_type = "FILE_S3"
}
If you want the Storage Gateway to join an Active Directory (AD) server, then specify join_smb_domain = true
and also set the input variables domain_controllers,
domain_name
, domain_password
and domain_username
. See the module README.md Inputs for a description of these variables.
Calling the S3 File Gateway NFS module
You can use the following code snippet after activating your storage gateway to automate the creation of NFS shares.
module "nfs_share" {
source = "aws-ia/storagegateway/aws//modules/s3-nfs-share"
share_name = "nfs_share_name"
gateway_arn = module.sgw.storage_gateway.arn
bucket_arn = "s3bucketname:arn"
role_arn = "iamrole:arn"
log_group_arn = "log-group-arn"
client_list = ["10.0.0.0/24","10.0.1.0/24"]
}
Note: client_list
is a required variable that restricts which source CIDR blocks can connect to the NFS endpoint provided by AWS Storage gateway. Also note that there are prerequisites required such as the S3 bucket, IAM role and the log group to be created before using this sub-module.
After setting the appropriate module input variables, we need to assign values for any Terraform variables in the root module without a default value. Tfvars files are a common and simple way to assign variables in Terraform. We have provided an example terraform.auto.tfvars.example
file. Rename this to terraform.auto.tfvars
the adjust the variables values using a text editor.
mv terraform.auto.tfvars.example terraform.auto.tfvars
The variables you set in the terraform.auto.tfvars
file will be passed into the module.
Step 3. Trigger the deployment
Before you can trigger a deployment, configure the AWS CLI credentials for Terraform using the service account that was created as part of the prerequisites.
- Run the command
terraform init
to download the modules and initialize the directory. - Run
terraform plan
and examine the outputs. - Run
terraform apply
and allow the apply to complete.
If the Terraform apply is successful, the output will appear as the following.
To view and examine the resources created by terraform, you can use the commands terraform state list
and terraform state show
commands.
Step 4. Use the file share
1. Navigate to EC2 console and verify the newly created gateway. Note the IP address of the EC2 instance.
2.Once in the AWS Management Console, navigate to AWS Storage Gateway.
3. Select the newly created Storage Gateway and compare the preceding IP address which maps to the Storage Gateway virtual machine deployed on EC2.
4. Navigate to the File shares from the menu or the left by directly selecting file share under storage resources to find the newly created file share. Copy the net use copy command to mount the file share.
5. Mount the NFS file share on your client. For more information, check out the using nfs file share documentation.
6. Your NFS file share backed by S3 File Gateway is now ready to use.
Additional considerations
This section describes additional considerations as you use the Terraform module, including steps to toggle on or off the creation of security groups and active directory domain configuration.
Network considerations
Terraform calls AWS APIs to manage the lifecycle of resources on AWS. Therefore, outbound internet connectivity to AWS API endpoints from the server or tool running Terraform will be needed to ensure Terraform can operate properly. For more information regarding the network requirements, refer to this page.
Activation workflow
A request to Storage Gateway traverses two network paths. Activation requests sent by a client connect to the gateway’s virtual machine (VM) over port 80 (HTTP) which in this case is deployed on Amazon EC2. If the gateway successfully receives the activation request, then the gateway communicates with the Storage Gateway endpoints to receive an activation key over port 443 (HTTPS) and completes the Security Gateway activation.
Storage Gateway does not require port 80 to be publicly accessible. The required level of access to port 80 depends on your network configuration. If you activate your gateway from a client virtual machine from which you connect/run the terraform scripts, that client must have access to your storage gateway’s port on 80. Once the gateway is activated, you may remove the rule that allows port 80 access from your client machine to the S3 File Gateway VM.
Storage Gateway VPC endpoint configuration
The latest version of the Terraform Storage Gateway allows you to create an Interface VPC Endpoint for Storage Gateway. A VPC Endpoint allows a private connection between the EC2 or VMware virtual appliance and the AWS storage gateway service. You can use this connection to activate your gateway and configure it to transfer data to AWS storage services without communicating over the public internet.
To create a VPC Endpoint using the module, set the variable create_vpc_endpoint=true
and supply the VPC ID, VPC endpoint subnets, and the private IP address of the EC2 Gateway as Terraform variables. The following snippet from examples/s3-nfs-filegateway-ec2/main.tf
shows VPC endpoint related configuration when calling the module.
A security group is also needed for the VPC Endpoint. In the preceding example, the module handles the creation of the security group. However, you may use the vpc_endpoint_security_group_id
variable to associate an existing security group with the VPC endpoint. See this documentation which shows the Security Group requirements for Storage Gateway VPC endpoint. In this module, the security groups are already pre-configured with the required rules with the private IP address of the storage gateway virtual machine. You can find the configuration in the file modules/aws-sgw/sg.tf.
S3 VPC endpoint configuration
We recommend you create a separate VPC endpoint for Amazon S3 File Gateway to transfer data through the VPC rather than a NAT Gateway or NAT Instances. This allows for optimized and private routing to S3 with lower costs. In the examples/s3-nfs-filegateway-ec2/main.tf,
we have created a Gateway VPC endpoint as shown in the following example:
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.aws_region}.s3"
route_table_ids = module.vpc.private_route_table_ids
}
Security group configuration
The Terraform Storage Gateway module provides the ability to create the security group and the required rules for your gateway to communicate with the client machines and the storage gateway endpoints. You can achieve this by setting the variable to create_security_group = true
. You can also limit access to a range of ingress CIDR blocks in your network from where you require access to the storage gateway. You can do this by modifying the ingress_cidr_blocks
attributes.
The module also includes the ingress_cidr_block_activation
variable specifically to limit access to the CIDR block of the client machine that activates the storage gateway on port 80. You can remove this security group rule once the gateway is activated. You can find the source code of the security group configuration in modules/ec2-sgw/sg.tf file.
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
vpc_id = var.vpc_id
subnet_id = var.subnet_id
ingress_cidr_block_activation = "10.0.0.1/32"
ingress_cidr_blocks = ["172.16.0.0/24", "172.16.10.0/24"]
create_security_group = true
}
You can toggle off the create_security_group
variableby setting it to false
if you use an already existing security group associated with your EC2 based storage gateway or if you would like to create the security group outside of the EC2 Storage Gateway module deployment. You may then specify your own security group ID by appending the security_group_id
attribute as shown in the following example:
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
vpc_id = var.vpc_id
subnet_id = var.subnet_id
create_security_group = false
security_group_id = "sg-12345678"
}
DevOps best practices
To scale your EC2 File Gateway module usage across your organization, consider these steps:
- Store infrastructure templates in a code repository and set up automated pipelines for testing and deployment.
- Automate Terraform Infrastructure as Code (IaC) workflows using tools like Terraform Cloud or AWS Developer Tools for collaborative, scalable, and governed IaC.
- Encourage module reuse by leveraging the EC2 File Gateway module and storing it in the Terraform Cloud Module Registry or a Git repository like AWS CodeCommit.
- Protect your Terraform state file integrity by using a backend like S3 and enable collaborative IaC with state file locking.
For additional resources, consider additional CI/CD best practices from AWS and Terraform considerations from Hashicorp.
Gateway sizing and performance
For a small gateway deployment hosting one to ten shares per gateway, use an m5.xlarge EC2 instance with four vCPUs and sixteen GiB of RAM as the default configuration in the Terraform EC2 Storage Gateway module. For higher performance to support more users and workloads in a medium or large deployment, consider m5.2xlarge or m5.4xlarge EC2 instances. You can find details on file shares and performance recommendations here.
The cache storage requirement ranges from 150 GB to 64 TB, typically sized for the hot data set, following the 80/20 rule. You can adjust instance type and cache size by modifying the instance_type
andcache_block_device : disk_sizes
attributes in the ‘ec2-sgw’ module in either the provided examples or your custom Terraform ‘main.tf’ file.”
As an example:
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
vpc_id = var.vpc_id
subnet_id = var.subnet_id
ingress_cidr_blocks = var.ingress_cidr_blocks
create_security_group = true
instance_type = "m5.2xlarge"
cache_block_device = {
disk_size = 150
}
}
For more information on the Storage Gateway vCPU and RAM sizing and requirements, consult this documentation page. To learn more about cache sizing, refer to this documentation.
When transferring large amounts of data to the Storage Gateway, deploy the File Gateway EC2 instance in the same Availability Zone as your client or your SQL Server EC2 instances to minimize cross-Availability Zone network charges. You can adjust the availability_zone
variable to match the desired zone during gateway creation.
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
vpc_id = var.vpc_id
subnet_id = var.subnet_id
ingress_cidr_blocks = var.ingress_cidr_blocks
create_security_group = true
availability_zone = "us-east-1a"
}
Data Encryption using KMS
By default, Storage Gateway uses Amazon S3-Managed encryption keys (SSE-S3) to server-side encrypt all data it stores in Amazon S3. You have an option to use the Storage Gateway API to configure your gateway to encrypt data stored in the cloud using server-side encryption with AWS Key Management Service (SSE-KMS) keys. For more information, refer to this link.
To encrypt the root and cache disk EBS volumes, append the cache_block_device
and root_block_device
to the EC2-SGW and supply the KMS key arn to the kms_key_arn
as shown in the following example:
module "ec2_sgw" {
source = "aws-ia/storagegateway/aws//modules/ec2-sgw"
vpc_id = var.vpc_id
subnet_id = var.subnet_id
ingress_cidr_blocks = var.ingress_cidr_blocks
create_security_group = true
availability_zone = "us-east-1a"
ssh_public_key_path = var.ssh_public_key_path
# Cache and Root Volume encryption key
cache_block_device = {
kms_key_id = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
}
root_block_device = {
kms_key_id = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
}
}
The s3-nfs-share
and s3-smb-share
sub modules allow you to add KMS encryption for your file shares. To encrypt a file share, add the attribute kms_encrypted=true
and supply the kms_key_arn
to the submodule as shown in the following example:
module "nfs_share" {
source = "aws-ia/storagegateway/aws//modules/s3-nfs-share"
share_name = "nfs_share_name"
gateway_arn = module.sgw.storage_gateway.arn
bucket_arn = "s3bucketname:arn"
role_arn = "iamrole:arn"
log_group_arn = "log-group-arn"
client_list = ["10.0.0.0/24","10.0.1.0/24"]
kms_encrypted = true
kms_key_arn = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
}
Credentials management
Refer to this documentation by HashiCorp on setting AWS credentials for Terraform. We recommend setting AWS credentials using environment variables, or using a named profile to keep them out of the repository and terraform state file. When possible, using temporary security credentials in Identity and Access Management (IAM) Role is preferred.
Cleanup
To delete all resources associated with this example, configure AWS CLI credentials as done in Step 3 of this post, and change to the examples/s3-nfs-filegateway-ec2 directory
. Run the terraform destroy
command to delete all resources terraform previously created. Note that any resources created outside of terraform will need to be manually deleted.
Conclusion
In this blog post we discussed how to provision an EC2 based Storage Gateway using Terraform by Hashicorp. We outlined steps to deploy an AWS Storage Gateway on Amazon EC2, activate your AWS Storage Gateway within AWS, create an Amazon S3 bucket, and create an NFS file share that is ready to be mounted by a client. The use of Infrastructure as Code increases consistency in deployments, speeds up deployment times, increases operational efficiency and therefore accelerating migrations to the cloud. Here, you can find the Storage Gateway Terraform Module in the Terraform Registry. You can customize the module to suit your organization’s needs and assist you in scaling your gateway deployments.
For more information and to learn more about AWS Storage Gateway, see the following:
- What is Amazon S3 File Gateway – AWS Storage Gateway
- Automate Amazon S3 File Gateway deployments in VMware with Terraform by HashiCorp | AWS Storage Blog
- Demo video: Automate deployments of S3 File Gateway in VMware using Terraform
- AWS Storage Gateway | AWS Storage Blog
- Terraform by HashiCorp
- AWS Storage Gateway Terraform module