AWS HPC Blog

Introducing support for per-job Amazon EFS volumes in AWS Batch

Large-scale data analysis usually involves some multi-step process where the output of one job acts as the input of subsequent jobs. Customers using AWS Batch for data analysis want a simple and performant storage solution to share with and between jobs.

We are excited to announce that customers can now use Amazon Elastic File System (Amazon EFS) with AWS Batch to read and write data within their running jobs. With Amazon EFS, storage capacity is elastic, growing, and shrinking automatically as you add and remove files. Your jobs can have the storage they need, when they need it. AWS Batch customers can take advantage of the recent announcement of 3X increases in read-throughput for Amazon EFS. When you reference your Amazon EFS file system and container mount point in your AWS Batch job definition, AWS Batch takes care of mounting the file system in your container.

Before this feature release, customers were restricted to staging data to and from independent data services like Amazon Simple Storage Service (Amazon S3) or customized AWS Batch compute environments that automatically mount a shared file system, like Amazon FSx for Lustre or Amazon EFS. With this new launch, customers have a standard and secure way to provide access to data in Amazon EFS at the individual job level. Finally, this feature works with all AWS Batch compute environments, including those leveraging Fargate technology.

Let’s look at how you can implement this new feature in your own AWS environment.

There are different methods for architecting how Amazon EFS and AWS Batch resources interact. For this post, we have opted to showcase how to use Amazon EFS access points to increase the security of accessing data from a job. These are the resources we will be defining for Amazon EFS and AWS Batch:

  1. An Amazon EFS file system with defined mount targets in each Availability Zone it is accessed from. The mount targets’ assigned security group will allow access from the AWS Batch compute environment.
  2. An Amazon EFS access point to restrict an AWS Batch job to accessing data within a subdirectory of the file system’s root.
  3. An AWS Identity and Access Management (IAM) service role to allow an AWS Batch job to access the file system via the access point.
  4. An AWS Batch compute environment that supports mounting Amazon EFS. Recent versions of the Amazon Elastic Container Service (ECS)-optimized AMI (20200319 with container agent version 1.38.0 and later) and Fargate platform version (1.4.0 and later) support this. More information is available in the Amazon EFS volume considerations
  5. A security group for the AWS Batch compute environment that allows communication with the Amazon EFS mount target. Specifically, the security groups assigned to the file system and compute environment must allow communication on TCP port 2049. For this example, we will leverage the Region’s default VPC and default security group, which allows all traffic across resources assigned to that same default security group.
  6. AWS Batch job definitions that define how to mount the Amazon EFS file system as a source volume and the path that is mounted in the container.

We will discuss more of the nuances and feature as we create the resources, and expand on further options in the conclusion section later in the post. Let’s get started building.

How to set up Amazon EFS and AWS Batch

Creating the Amazon EFS file system and access point

To create your Amazon EFS file system:

  1. Open the Amazon EFS management console at https://console.aws.amazon.com/efs/
  2. Choose Create file system to open the Create file system dialog box.
  3. For Name, enter BatchEfsFileSystem.
  4. For Virtual Private Cloud (VPC), choose a VPC. In my case, I use the Region’s default VPC. If you do not have a default VPC, then you need to account for the security group to allow AWS Batch resources to access the filesystem.
  5. For Availability and Durability, choose Regional. The One Zone storage class is only accessible to compute resource located in subnets within the same Availability Zone. For this example, we want to access the file system from all available Availability Zones in the Region.
  6. Choose Create to create a file system that uses the service recommended settings, such as creating mount targets in each Availability Zone, using the VPC’s default security group, encryption of data at rest enabled using your default key for Amazon EFS, etc. After you create the file system, you can customize the file system’s settings except for availability and durability, encryption, and performance mode.

Since we used the web console to create the file system in the default VPC, mount targets were automatically created in all of the Region’s Availability Zones. The file system mount targets also use the default VPC’s default security group, which allows all network traffic among resources attached to that same security group. Verify that this is the case by choosing the Network tab of the console detail view of the created file system, as shown in the following figure.

The detail view of the Amazon EFS file system showing the network mount targets in subnets and the security group

 

We will also use the default VPC and security group when we create the AWS Batch resources. This will allow for transparent communication between the file system and compute resources. If you do not use the default VPC and security group, you must account for necessary ingress/egress rules in the security group to allow communication between resources. For more information on this topic, refer the Managing file system network accessibility documentation. If you chose to use One Zone for Availability and durability, then you must also ensure that the subnets that AWS Batch compute resources are launched in correspond to the Amazon EFS mount targets’ Availability Zones. For more information on Amazon EFS storage classes, refer to the Managing EFS storage classes documentation.

To create the file system access point:

  1. In the Amazon EFS console, in the navigation, choose Access points to open the Access points window.
  2. Choose Create access point to display the Create access point
  3. Enter the following information in the Details panel:
    1. For File system, enter the file system ID of the file system you just created.
    2. (Optional) For Name, enter BatchEfsAccessPoint.
    3. For Root directory path, enter /batch/blogExample. This will restrict AWS Batch jobs to just that subdirectory of the file system.
  4. Enter the following in the Root directory creation permissions panel:
    1. For Owner user ID, enter 1000.
    2. For Owner group ID, enter 1000.
    3. For POSIX permissions, enter 0755.
  5. Scroll to the end of the page and choose Create access point to create the access point using this configuration.

Once the Amazon EFS file system and access point are created, take note of the file system ID (fs-xxxxxxxx) and access point ID (fsap-xxxxxxxxxxxxxxxxx) as we will use these within the IAM policy statement.

Creating the IAM resources

Next we will create an Amazon IAM policy to allow access to the file system access point, and the IAM service role so that AWS Batch jobs can leverage to mount and access the data in the file system.

To create the IAM policy:

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/
  2. In the navigation pane on the left, choose Policies.
  3. Choose Create policy.
  4. Choose the JSON tab.
  5. Copy and paste the following JSON policy into the text area. Remember to replace the REGION, ACCOUNT_ID, the file system ID (fs-xxxxxxxx) and file system access ID (fsap-xxxxxxxxxxxxxxxxx) in the JSON policy statement.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticfilesystem:ClientMount",
                "elasticfilesystem:ClientWrite"
            ],
            "Resource": "arn:aws:elasticfilesystem:REGION:ACCOUNT_ID:file-system/fs-xxxxxxxx",
            "Condition": {
                "StringEquals": {
                    "elasticfilesystem:AccessPointArn": "arn:aws:elasticfilesystem:REGION:ACCOUNT_ID:access-point/fsap-xxxxxxxxxxxxxxxxx"
                }
            }
        }
    ]
}
  1. When you are finished, choose Next: Tags.
  2. On the Review policy page, type BatchEfsJobDefPolicy for the Name.
  3. Review the policy Summary to see the permissions that are granted by your policy. Then choose Create policy to save your work.

How to create the IAM role

  1. In the navigation pane of the IAM console, choose Roles, and then choose Create role.
  2. For Select type of trusted entity, choose AWS service.
  3. Choose Elastic Container Service as the service that you want to allow to assume this role.
  4. Choose Elastic Container Service Task as the use case for your service. Then choose Next: Permissions.
  5. To show the policy you just created, enter BatchEfsJobDefPolicy into the Filter policies search area. Then choose the policy.iamge showing the selection of the right IAM Policy for the role
  6. Choose Next: Tags.
  7. Choose Next: Review.
  8. For Role name, enter BatchEfsJobRole .
  9. Choose Create role to create the role.

You now have an IAM role for use in AWS Batch job definitions that allow access to this specific Amazon EFS file system. The next step is to define the AWS Batch resources.

While you are using the IAM console, check to see if the Amazon ECS task execution role exists by choosing Roles in the navigation pane, and entering ecsTaskExecutionRole in the search text area. If you do not see the ecsTaskExecutionRole role in the result listing, refer to the AWS Batch documentation on creating the role. We will need both of these roles for creating the AWS Batch resources.

Creating the AWS Batch resources

Creating a managed compute environment using AWS Fargate resources

  1. Open the AWS Batch console at https://console.aws.amazon.com/batch/
  2. In the navigation pane, choose Compute environments, Create.
  3. Configure the environment.
    1. For Compute environment type, choose Managed.
    2. For Compute environment name, enter BatchEfsCE.
    3. Ensure that Enable compute environment is selected so that your compute environment can accept jobs from the AWS Batch job scheduler.
    4. For Additional settings: Service role, instance role, EC2 key pair.
      1. For Service role, choose Batch service-linked role.
  4. Configure your Instance configuration.
    1. For Provisioning model, choose Fargate to launch Fargate On-Demand resources.
    2. For Maximum vCPUs, enter 10.
  5. Configure networking.
    1. For VPC ID, choose a VPC where you intend to launch your instances.
    2. For Subnets, choose which subnets in the selected VPC should host your instances. By default, all subnets within the selected VPC are chosen.
    3. Expand Additional settings: Security groups, EC2 tags.
      1. For Security groups, choose a security group to attach to your instances. By default, the default security group for your VPC is chosen. At this point you should check that the VPC ID, subnets, and security group are the same as what was chosen for Amazon EFS. This is verifiable by comparing the Subnets and Security groups shown in the console to the Amazon EFS file system details.

A detail of the Batch compute environment network settings showing the same subnets and security group as the EFS file system

  1. Choose Create compute environment to finish

How to create the job queue:

  1. From the navigation pane of the AWS Batch console, choose Job queues, Create.
  2. For Job queue name, enter BatchEfsJQ.
  3. For Priority, leave 1 as the default.
  4. In the Connected compute environments section, select the BatchEfsCE compute environment from the list to associate it with the job queue.
  5. Choose Create to finish and create your job queue.

Now that we have a compute environment and job queue, the last type of resource needed are a couple of job definitions. AWS Batch job definitions define how an Amazon EFS file system is mounted and used within the underlying container. We will create two job definitions, one that will create a file (createEfsFile) in the file system, and another that reports what files are present (listEfsDir). We will run them in sequence, listEfsFDir → createEfsFile → listEfsDir, and look at the results. We could wait for each job to complete before submitting the next one, but in this example we will use the AWS Batch job dependency feature to start the next task only after the dependent task is complete.

To create the job definitions:

  1. In the AWS Batch console, in the navigation pane, choose Job definitions, Create.
  2. For Name, enter listEfsDir
  3. For Platform, choose Fargate
  4. In Container properties
    1. For Command enter ls -l -a /mount/efs/ into the text area.
    2. For Execution role, select the ecsTaskExecutionRole.
    3. For Assign public IP, select Enable
    4. For Fargate platform version, enter 1.4.0
    5. Expand the Additional configuration section, and choose BatchEfsJobRole you created for the Job role
    6. In the Mount points section, you can configure mount points for your job’s container to access.
      1. For Source volume, enter efs as the name of the volume to mount
      2. For Container path, enter /mount/efs as the path on the container at which to mount the host volume.
    7. In the Volumes section,
      1. Choose Add volume, and choose the Enable EFS option to be active.
      2. For Name, enter efs.
      3. For File system ID, enter the ID from your Amazon EFS file system (should resemble fs-xxxxxxxx).
      4. For Root directory, enter /
      5. Choose Enable transit encryption to be active
      6. Do not set a value for Transit encryption port. The value will be used from the Amazon EFS mount helper.
      7. For Access point ID, enter the ID from your Amazon EFS file system’s access point (should resemble fsap-xxxxxxxxxxxxxxxxx)
      8. For Use selected job role, choose Enable to be active.
  5. Scroll down to the bottom of the form and choose Create

To create the job definition for creating a file, repeat the preceding steps again with the following changes

  • In step 2, for Name, enter createEfsFile.
  • In step 4, part a, for Command enter touch /mount/efs/hello_world.txt into the text area.

You should now have two AWS Batch job definitions, one to create a file and another to list the files within a directory.

Running the jobs

Now that everything is in place, we can submit the AWS Batch job requests for our simple workflow using dependencies for later tasks.

To create the initial job request:

  1. In the navigation pane, choose Jobs, Submit job.
  2. For Job name, choose use listDir1.
  3. For Job definition, choose a previously created listEfsDir job definition for your job.
  4. For Job queue, choose a previously created job queue BatchEfsJQ.
  5. Scroll to the bottom of the form and choose Submit job.

You should see a success message at the top of the console for the submitted job. Copy the job ID from the dialog box, is should resemble something like 98d0af90-8606-4297-b9bd-a0d537195192 as shown in the following image.

Notification that a AWS Batch job was created

To create the dependent job requests:

  1. In the navigation pane, choose Jobs, Submit job.
  2. For Job name, use createHelloFile.
  3. For Job dependency, fill in the job ID of the listDir1
  4. For Job definition, choose a previously created createEfsFile job definition for your job.
  5. For Job queue, choose the BatchEfsJQ job queue.
  6. Scroll to the bottom of the form and choose Submit job.
  7. In the navigation pane, choose Jobs, Submit job.
  8. For Job name, enter listDir2
  9. For Job dependency, fill in the job ID of the createHelloFile job that you just created.
  10. For Job definition, choose the listEfsDir job definition.
  11. For Job queue, choose the BatchEfsJQ job queue.
  12. Scroll to the bottom of the form and choose Submit job.

At this point the console should show the first job in some initialization or running state and the other two as PENDING.

A view of the AWS Batch running jobs showing the ones that were just submitted. One job is shown as RUNNING and the others have PENDING status.

Once the first job completes, you can check the standard output in the linked Amazon CloudWatch log stream from the listDir1 job detail page.

The result of listing the directory, which is empty.

 

After the other two jobs complete, you should now see a new file in Amazon EFS with the name hello_world.txt in the standard output written to the log stream linked from the listDir2 job detail page.

The result of listing the directory, which has one file called hello_world.txt

Cleanup

To avoid additional charges, clean up the resources by deleting, the Amazon EFS file system. This will also delete the data and other resources associated with the file system, like the mount targets and access points.

For good hygiene, you should also delete the BatchEfsJobDefPolicy IAM policy and BatchEfsJobRole IAM role since these were scoped to the deleted file system and they are no longer needed. Similarly, you should delete the AWS Batch listEfsDir and createEfsFile job definitions. The other AWS Batch resources can be left as is, as they do not incur costs, but you can choose to disable and delete them as well.

Conclusion

In this post, we announced a new feature where AWS Batch can access shared data from Amazon EFS file systems at the job level. We walked through an example that showcased Amazon EFS access points to restrict access to a subdirectory of the file system, in addition to enforce other security measures such as enabling transport encryption and data encryption at rest.

There are more Amazon EFS features we did not cover that you may be interested in. For example, access points also provide file system native security features such as enforcing POSIX user and group permissions for the data read and write requests. You can read more about enforcing user identity security features in the Enforcing a user identity using and access point documentation. We also did not cover securing the file system mount targets. The default EFS file system policy grants full access to any client that can connect to the file system using a file system mount target. The default policy is in effect whenever a user-configured file system policy is not in effect, including at file system creation. To secure the mount targets with a file system policy, refer to the documentation on using IAM to control file system data access.

Using Amazon EFS within your AWS Batch environment provides a low-friction and performant method for sharing data across and between AWS Batch jobs. To learn more about leveraging Amazon EFS with AWS Batch, check out the full AWS Batch documentation on using Amazon EFS volumes.

Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.