AWS Partner Network (APN) Blog

Breaking Cloud Barriers: WEKA Redefines Cloud Storage Performance

By Boni Bruno, Director Technical Marketing & Performance Engineering – WEKA
ByVenkatesh Aravamudan, Sr. Partner Solutions Architect – AWS
ByGaurav Bhatnagar, Solutions Architect – AWS

Weka Logo
WEKA
Want to work with WEKA?

Managing compute-intensive workloads in Artificial Intelligence (AI) and High Performance Computing (HPC) presents several challenges, primarily due to the complexity of data ingestion, processing, and integration from diverse sources. The volume and velocity of data, coupled with the need for real-time or near-real-time processing, make it difficult to ensure data integrity and consistency across distributed systems. Moreover, maintaining high throughput and low latency while managing network bandwidth and storage resources demands sophisticated data management strategies, making the task resource-intensive. Additionally, scaling to meet fluctuating demands is complex, particularly when balancing cost and performance, and handling mixed I/O patterns further complicates system management. Addressing these challenges is essential for organizations to unlock the full potential of their compute-intensive workloads and achieve optimal performance and cost-efficiency.

WEKA, an AWS Partner, with Storage Competency and AWS Outposts Service Ready designation, offers a modern file system that empowers organizations to tackle storage challenges in high-performance computing.

This blog explores how WEKA redefines cloud storage performance and pushes the boundaries of cloud based storage for AI and HPC workloads. By leveraging WEKA’s innovative file system and the power of AWS’s robust cloud infrastructure, organizations can access unparalleled storage performance and unlock new possibilities in their AI and HPC workloads.

About WEKA Data Platform

Built on Amazon Web Services (AWS), Weka’s storage infrastructure solution is designed as a software-only, highly scalable, and easy-to-manage storage solution. Weka’s file system combines the performance of all-flash arrays, the simplicity and feature set of Network-Attached Storage (NAS), and the scalability and economics of the cloud, making it a versatile and powerful choice for data-intensive technical computing.

The WEKA cluster can be deployed using a ready-to-deploy WEKA Terraform module on AWS Virtual Private Cloud (VPC). The following diagram outlines the Terraform automated steps for provisioning WEKA cluster backend servers on Amazon EC2 instances.

Figure 1: WEKA cloud deployment on AWS

Figure 1: WEKA cloud deployment on AWS

Minimum IAM policy permissions required for the installation user are documented here.

The network architecture below illustrates a sample deployment of WEKA Backend and Client Amazon EC2 instances in a private subnet within an Availability Zone.

Figure 2: Detailed view of Weka Cluster from Figure 1

Figure 2: Detailed view of Weka Cluster from Figure 1

This architecture provides a blueprint of the WEKA cloud deployment on AWS, showing the integration between WEKA’s file system and AWS’s cloud infrastructure.

WEKA’s Performance: The SPECstorage® 2020 Results

SPECstorage 2020 is the touchstone benchmark for storage performance, assessing storage performance across high-demand workloads. The benchmarking is across five workloads, AI, Electronic Design Automation (EDA), GENOMICS, Video Data Acquisition (VDA), and SWBUILD. WEKA’s recent cloud results in the industry standard SPECstorage 2020 benchmarks prove the value of cloud storage in the realm of storage for AI and HPC against traditional siloed storage solutions.

Let us dive into results achieved in various types of workloads in the benchmarking exercise.

1. AI Workload: A BIG Leap Forward

The SPECstorage 2020 AI benchmark focuses on simulating the storage demands of Artificial Intelligence (AI) image processing workflows. The benchmark uses traces collected from systems running popular AI frameworks like TensorFlow processing image datasets (COCO, RESNET50, CityScape). It encompasses various AI training activities with distinct IO patterns:

  • AI_SF (Small File): Represents frequent reads of small files, typical during model loading and parameter updates.
  • AI_TR (Training): Simulates high-throughput writes for large datasets used in training.
  • AI_TF (Test Features): Models writing intermediate feature data during training.
  • AI_CP (Checkpoint): Less frequent but large writes for saving model checkpoints.

The primary metric measures the number of AI jobs finished within a set timeframe.

WEKA’s performance in the AI image workload achieved 2400 jobs on AWS at an Overall Response Time (ORT) of 1.38 milliseconds. A job in this case mimics real-world AI processing by testing how the storage system performs under data-intensive tasks typical of image recognition and analysis. ​ This means faster model training when using the WEKA Data Platform.

This level of performance will be a critical success factor for any organization looking to incorporate AI in their applications. With large language model training now incorporating billions of parameters, the ability to process massive data sets is critical for AI model training, tuning, and inference.

The WEKA Data Platform’s Converged Mode solution on AWS enables a “zero storage footprint” for specific use cases such as AI/GenAI model training, (re)tuning, inferencing, and for supporting various HPC workloads. In this mode, WEKA can be safely and predictably deployed across large-scale application/GPU server farms, sharing server resources, including NVMe storage, alongside applications.

2. EDA Workload: Surpassing On-Premises Solutions

This EDA benchmark goes beyond a single workload, instead it separates front-end (design capture, simulation) and back-end (place and route) activities with distinct IO patterns. This allows for more realistic performance evaluation across the entire EDA design process, helping identify storage bottlenecks and assess how well a system scales with increasing design complexity.

For the EDA Blended workload, WEKA achieved 6310 jobs at a fast 0.87 millisecond ORT. By using Weka and leveraging the power of AWS, customers can accelerate their EDA workloads and time to results.

3. Genomics Workload: Accelerating Discovery

The field of genomics thrives on rapid data processing. This workload is heavily skewed towards reads (around 70%), reflecting the need to access large reference genome databases and sequencing data files. There’s also a mix of writes (8%), metadata operations (12%), and other file operations for a more realistic representation of the workflow. The benchmark uses a typical file size of around 1.6MB, which is representative of many genomics data formats.

WEKA’s achievement of 2200 jobs at a 0.59 millisecond ORT translates to faster genomics data processing.

4. VDA Workload: Unprecedented Performance

This workload simulates the storage demands of video surveillance systems or applications continuously capturing and storing video streams. The benchmark accurately reflects the high throughput requirements of video capture systems using large files (around 1GB).

WEKA’s VDA workload performance result shows 12000 jobs at a 3 millisecond ORT highlighting WEKA as a good fit for these complex workloads.

5. SWBUILD Workload: Surpassing Elite On-Premises Systems on Latency

The SWBUILD workload consists of metadata-intensive tests that are derived from real-world applications. It is designed to simulate the kind of continuous integration tasks common in software development, where multiple builds are often processed simultaneously.

WEKA on AWS achieves 3500 jobs at an ORT of 0.74 milliseconds. This result demonstrates the ability of WEKA on AWS to deliver storage performance in the cloud at low latency, showing cloud storage with WEKA as an alternative to even the most expensive on-premises systems.

Conclusion

WEKA’s results in SPECstorage 2020 demonstrate that the cloud is not only practical but also superior for organizations seeking cutting-edge performance. Benchmarks across five workloads (AI, EDA, GENOMICS, VDA, SWBUILD) highlight the power of Weka on AWS, showing that cloud environments can rival and often surpass expensive hardware-based storage arrays.

WEKA on AWS offers organizations the flexibility of the cloud without compromising HPC storage performance. As WEKA has proven, the cloud is increasingly the top choice for high-performance needs, redefining possibilities for HPC storage in the cloud.

Learn more about Weka in this WEKA Architectural Whitepaper and read the customer success stories.


WEKA – AWS Partner Spotlight

WEKA offers a high-performance data platform that accelerates HPC and AI workflows, increases GPU utilization, and reduces storage costs. Organizations in every industry vertical rely on the WEKA Data Platform for performance-intensive AWS workloads, including generative AI, machine learning, financial modeling, life sciences, media rendering, and HPC research.

Contact WEKA | Partner Overview | AWS Marketplace

Disclaimer: All the external links referred to in this blog are third party contents and are not written by nor controlled by AWS. Also, the terraform template link referred in this blog is owned by Weka and the readers should reach out to Weka directly for any questions / issues.