AWS HPC Blog
Strategies for distributing executable binaries across grids in financial services
Financial Services Institutions (FSIs) rely heavily on compute grids for their daily operations. One key challenge of operating such grids is the distribution of data, including the distribution of binaries required for computation.
In this blog post, we’ll focus on classifying different binary distribution techniques and providing intuition and guidance for customers about when to use each technique in connection with their business objectives and requirements. We’ll also offer insights and recommendations that others can apply outside of the FSI industry.
The problem
Most customer concerns regarding data distribution revolve around business objectives, and cost. Some of these concerns include cost-effectively distributing large volumes of data regularly, and doing it quickly, to ensure the rapid start of compute instances. This helps to avoid subsequent delays throughout the execution, and often translates into cost savings, too.
Consider a grid configured to scale to 5,000 x Amazon Elastic Compute Cloud (Amazon EC2) instances. With fluctuating demand causing frequent scaling events, each new instance has to download the binary files it needs. Assuming up to 5 scaling events daily, with a 1-minute download time, this results in over 416 hours or 17 days of wasted EC2 time – highlighting the value of efficient binary distribution.
How to assess your options
Choosing the suitable data layer is not just a technical decision but a strategic one that impacts the cost efficiency of the entire computing grid. The wrong choice can lead to significant inefficiencies and increased costs. Therefore, it’s crucial to consider the options and their implications carefully.
What factors should customers consider when evaluating their options for binary distribution based on workload requirements and characteristics?
- Size – What is the total size of the super-set of all binaries that must be present on the Amazon EC2 instance? What is the size of the most commonly used binary files sufficient to start the workload in a lazy loading situation (i.e., waiting for the rest of the files to be delivered later)?
- Size distribution: Does the binary set consist of a large number of small files, or a small number of large files?
- Scaling: What’s the maximum number of instances you need to have running simultaneously? What’s the maximum scaling speed (the higher the speed, the more bandwidth necessary)?
- Semantics: Does the workload require POSIX file access? Will it be necessary to have excess capacity, to allow adding to, or altering, binary sets during the production stage?
Options
Amazon Simple Storage Service (Amazon S3)
Amazon S3 is the AWS object storage service, allowing users to store and retrieve any data from anywhere. Amazon S3 can act as a common destination for all built and versioned binary files, and distribute these files among Amazon EC2 instances.
To illustrate typical tradeoffs, we begin with the most intuitive approach of copying the required files from Amazon S3. A common destination could be EC2 Instance Store or an Elastic Block Service (EBS) volume attached to the EC2 instance. A grid administrator can script the copy operation at instance boot time (e.g., using aws s3 sync
).
The primary advantage is that once files are copied, the workload has the lowest latency and highest bandwidth for accessing these files. Performance isn’t impacted by the total number of EC2 instances as each instance has its own volume.
Furthermore, Amazon S3 provides very high throughput, sufficient even for the most demanding use cases. Combined with the pay-as-you-go model, grid administrators don’t need to scale S3 to match fluctuating bandwidth demands, simplifying administration.
Amazon S3 is a regional service with no charges for data transfers across Availability Zones (AZs) and has very low storage costs. In the context of binary distribution, the bulk of the cost will be associated with the number of files transferred, which comes from API calls. This is important to consider for binaries consisting of many small files.
The key tradeoff is the initial delay associated with the time it takes to copy the files from S3. During this time, EC2 will have to idle until all the data is copied. You can partially mitigate this by splitting your binaries and downloading them in parts – at the expense of added complexity to manage this.
Amazon S3 Express One Zone (EOZ)
Amazon S3 Express One Zone (EOZ) stores data in a single Availability Zone, reducing data access latency and increasing bandwidth from within the AZ. Because FSI grids often require significant computing capacity our recommendation is to diversify EC2 instance types and use as many AZs as possible. However, this conflicts with the benefits of single AZ solutions.
Grid administrators faced with this have two main options. They can have a single instance of the storage service running in a single AZ, which will partially alleviate latency and bandwidth advantages when instances from other AZs access data. Alternatively, grid administrators can have a separate instance of the storage in each AZ – this will increase storage costs and add complexity to the synchronization of data between S3 EOZ buckets before the start of the workload.
Mountpoint for Amazon S3
Mountpoint for Amazon S3 – an open-source file client that enables easy access to objects in an S3 bucket through standard file system APIs on your EC2 instance.
With Mountpoint, files stay in S3, and only get locally cached when accessed, unlike the previously described technique. This makes it possible to start the workload instantaneously and load files on-demand throughout the execution process. This approach particularly useful when you don’t know beforehand which binary files you require for the workload execution, or if only a small fraction of files are likely to be needed.
However, the first execution of the workload will still experience higher latency than subsequent executions due to the need to cache files locally.
Amazon Elastic File System (EFS)
EFS is a fully managed, network-attached storage service that supports NFS protocols. It provides a POSIX file access API and supports up to 20 GB/sec of read throughput, up to 250,000 read IOPS, and up to 25,000 connections. The actual performance will depend on the throughput mode selected for EFS: Elastic, Provisioned, or Bursting which you should choose after assessing your workload requirements (see more details in the EFS documentation).
Unlike S3-based solutions there are no charges associated with accessing individual files stored on EFS. Instead, there is a cost associated with the amount of data transferred.
Similarly to S3 Express One Zone, is EFS One Zone (OZ). However, in addition to tradeoffs associated with one AZ solutions (which we described earlier) EFS OZ has cross-AZ data transfer charges, thus we don’t recommend using EFS OZ for this task.
Amazon FSx
Amazon FSx lets you choose between several widely-used, distributed, network-attached file systems: Lustre, NetApp ONTAP, OpenZFS, and Windows File Server. Each of these file systems provides even greater throughput and lower latency than EFS. You can explore a more detailed comparison of these services using our published advice.
This post focuses on FSx for Lustre (FSxL), which can scale to provide 1TByte/sec throughput (or more), millions of IOPs, and sub-millisecond access latency, making it the best choice for many large-scale workloads.
FSxL also integrates with Amazon S3, allowing users POSIX-level file access to files stored on S3. When a client attempts to access a file, FSx first retrieves it from S3 and caches it at the FSxL server, making it readily available for subsequent requests by other users.
FSxL can be configured to match target capacity and/or throughput. This is a task the admin must invoke manually. FSxL deploys into a single AZ, meaning file access across AZs will incur additional data charges.
Both FSxL and EFS offer full POSIX file-access support, allowing writes by any Amazon EC2 instance to be available to other instances. While binary files are generally immutable and don’t need this feature, it’s beneficial for the same data layer to be used for other needs like providing computational input, storing results, or maintaining shared state.
Amazon File Cache
Amazon File Cache is a high-speed caching service similar to FSxL but with a number of distinct differences.
First, in addition to supporting Amazon S3, File Cache can support data repositories using NFSv3, which means that the source files can be located virtually anywhere, including on-premises. This can simplify the process of delivering quant libraries – especially if they are built on-premises.
File Cache automatically loads files and metadata from the origin and releases the least recently-used cached files to ensure the most active files are available in the cache for your applications. File Cache is built using Lustre, which means effective I/O throughput can be very high.
Amazon EBS volumes
Amazon EBS volumes are persistent block storage devices that can be attached to an EC2 instance. They function like HDD/SSD drives, with varying performance characteristics and costs. For example, io2 Block Express volumes offer a maximum throughput of 4000 MB/s and 256,000 IOPS, per volume.
To use EBS volumes for binary distribution, you can follow these steps:
- Create a source EBS volume for the binary distributions, you can do this as part of CI/CD pipeline.
- Create a persistent EBS snapshot, which is a point-in-time copy of an EBS volume saved in S3. You can use EBS snapshots to restore and create multiple copies of the original EBS volume.
- Configure an EC2 Launch Template to include an EBS volume based on the newly created snapshot.
When an EBS volume is created from a snapshot and attached to an EC2 instance, only the metadata of the volume is loaded initially. The volume’s data blocks are not loaded from S3 into the EBS block storage until they’re accessed or needed by the instance. These blocks will be lazy-loaded as they’re being accessed. As a result, the very first access to these blocks (and consequently the first workload invocation) will incur additional latency (we explain this in more detail in our documentation).
When creating EBS volumes from a snapshot within the same region, there are no additional charges for data transfer, i.e., you only pay for the GBs of storage capacity.
For completeness, we should mention that io1
and io2
volume types enable you to attach a single EBS volume to up to 16 x EC2 instances with the same AZ, reducing the total number of EBS volumes needed. However, this comes at the tradeoff of additional management efforts – mapping volumes to EC2 instances. Also, the total IOPS performance across all the attached instances cannot exceed the volume’s maximum provisioned IOPS. Finally, you should also consider EBS performance when sharing a volume among multiple concurrent workers running on a large EC2 instance.
AMIs are a template that contains the software configuration required to launch an Amazon EC2 instance. An AMI often includes a reference to one or more EBS Snapshots. In the context of binary distribution, AMIs can be thought of as an abstraction layer on top of EBS volumes, bypassing the need to manually attach an EBS volume to an instance. There are no performance or cost benefits in prebaking binary files into AMIs over using EBS volumes. Therefore, the choice is based on the managerial overhead and the grid’s architecture. We explore this topic in an article about Prebaking vs. bootstrapping AMIs.
Warm Pools in EC2 Auto Scaling let you pre-initialize EC2 instances and then stop them, until you need to scale out. The EBS volume continues to hydrate in the background while the instance is stopped, saving you time and resources when you’re ready to scale, since your instances will spin up quickly and EBS volumes will be hydrated.
Fast Snapshot Restore (FSR) is a feature of Amazon EBS that optimizes EBS snapshots for faster volume restoration. When an EBS snapshot is marked for FSR, the EBS volumes restored from this snapshot can benefit from full performance immediately as the volume is created, without the need for a lazy loading mechanism. There is a credit-based system associated with the rate at which you can create EBS volumes from an FSR-enabled snapshot. These limits might render this method unsuitable for large-scale grids.
Summary of options discussed
Let’s now summarize the relevant characteristics of these services in a table (Figure 1). The Source column shows where the binaries come from, while the Destination column indicates if files are relocated or accessed remotely.
The next column indicates if all files are accessible as soon as EC2 boots up. For example, if the files are stored on Amazon S3 and need to be copied to EBS before the workload can start, this step requires manual implementation. Next, some services deliver full performance upon the first invocation, while other are not. For example, EFS provides full performance right away, but restoring an EBS volume from a snapshot may result in longer access times initially, due to hydration.
The cost column indicates workload dimensions that are most likely to account for the bulk cost as your service scales horizontally. We’ve shown this column only for an indicative purpose – review the costs of each service yourself to get an accurate estimation. For example, for a workload with many small files stored on Amazon S3 and accessed via S3 Mount Point, there is a cost associated with API calls per file. In contrast, the cost of FSxL varies based on provisioned throughput and total data transferred.
Finally, the last three general columns indicate if the same service can be reused to deliver more files at runtime. For example, you can place a new version of a binary distribution on S3 and retrieve it via S3 Mount Point. But if you distribute the files using EBS or an AMI, then a new EBS needs to be mounted – or a different method needs to be used – to get the updates. The global file system capabilities indicates if writes from one instance are available for reading by other instances in the grid.
Some single AZ services have costs associated with cross-AZ network traffic, so refer to the documentation of each service for details.
Conclusion and recommendations
The purpose of this post is to provide a general guide on how to reason about the challenge of binary distribution in large FSI grids.
Our suggestion is to begin by evaluating the problem dimensions outlined in this post, in relation to your workload. Then, consider whether binaries need to be relocated closer to the computing resources or if they can be accessed over the network. We recommend starting with the simplest solution that fits the requirements, testing it, and then determining if further improvements are necessary.
When moving files to a compute instance, there may be a high initial overhead, but it can ensure consistent, isolated high performance. When it comes to network file systems, it’s important to keep in mind that they are shared among all compute instances that access them. Thus, NFS should be pre-provisioned for the load and scale you anticipate. Finally, you should always consider cross-AZ traffic costs and latency when making your choice.
Complexity often requires balancing performance and costs. You can incorporate your knowledge of your workloads and some simple heuristics in how you select the distribution methods presented in this post to mitigate many trade-offs, and feel good about your decisions.