Lyrebird improves performance and reduces costs for generative AI workloads using Amazon S3 Express One Zone

Through accessible mobile apps, Lyrebird Studio aims to transform photography into a creative tool for everyone. Founded in 2011, the company is a leading global developer and software publisher for users who enjoy expressing themselves and creating social content.

To give millions of users a responsive experience with minimal downtime, Lyrebird Studio needs its data storage solution to be highly scalable, cost-efficient, and durable. Since its launch, Lyrebird Studio has used Amazon S3 as its primary storage system, processing and serving millions of images and delivering them to users every day. With a growing user base and more feature-rich app functionality, Lyrebird Studio turned to S3 Express One Zone, a high-performance, single-Availability Zone storage class that delivers consistent single-digit millisecond data access, to lower latency for image reads and writes and ensure continued high performance for its customers.

In this post, we describe how S3 Express One Zone improved Lyrebird Studio’s generative AI image-to-image synthesis infrastructure to process millions of requests per day more efficiently. Using S3 Express One Zone, Lyrebird was able to lower latencies on inference pipelines and complete workflow operations 80% faster, which helped reduce its compute costs by 11%. Coupled with 50% lower request costs helping to reduce storage costs by 35%, Lyrebird was able to reduce the TCO of its solution by 18%. Enhanced performance has been crucial in maintaining a robust user experience, especially as the inference pipelines are layered together during image processing and multiple operations have to occur at high speed.

Understanding Lyrebird’s storage needs

Lyrebird’s main workload involves performing img2img inferences, or synthesizing a new image from a source image and a text prompt, using cutting-edge deep-learning models like Stable Diffusion.

Figure 1 - Initial architecture of LyreBird’s inference architecture with Amazon S3 Standard for storing images

Figure 1: Initial architecture of LyreBird’s inference architecture with Amazon S3 Standard for storing images

Here’s an example of the life of an img2img task, going through the architecture depicted in Figure 1. Lyrebird uses AWS Organizations for access management and to provide customized environments to different teams. The initial part of the img2img task happens in the Backend System account. When an end user chooses images on their device for image processing, upload URLs are requested from a service running on Amazon ECS. Using one of these pre-signed URLs, the end user, through a mobile client, uploads an input image to an S3 general purpose bucket. The client then invokes an API to trigger an image processing event, which is then sent to an Amazon SNS topic. The machine learning happens in another AWS account, the Inference System account, in the same AWS Organization. Lyrebird calls this account the Inference System account because its machine learning models are researched, developed, and also deployed here on purpose-built deep learning instances.

In the Inference System account, an Amazon SNS message is published to a subscriber, an AWS Lambda function. The AWS Lambda function validates, transforms, and conveys the message to an Amazon SQS queue, which is continuously polled by a pool of Amazon EC2 Inf2 instances. The time-consuming processing tasks are handled by these powerful Amazon EC2 instances, where the input image is downloaded to memory from S3 object storage, pre-processed, processed, and post-processed before then being uploaded from memory back to object storage.

For Lyrebird’s applications, it is critical to serve output images back to users with the least delay possible to maintain a responsive mobile app experience. As client installs on Android and iOS grew to over 1.5 billion across Lyrebird’s 100+ mobile app products, this rapid user growth started highlighting performance bottlenecks in the architecture. The overall processing time of a user request depends on the compute-driven inference time and the time taken to transfer data to and from Amazon S3. As mentioned, image processing is done on powerful EC2 Inf2 instances that deliver high performance and low latency for image generation tasks. As such, a significant amount of the time taken for processing comes from the delays caused by image downloads and uploads to storage.

Lyrebird’s solution and revised architecture

In its design approach, Lyrebird considered the performance limits of existing storage solutions in terms of speed and throughput. Speed is crucial for each component, but throughput is equally important due to the nature of applications that receive tens of thousands of requests per minute from around the world. Therefore, to be viable in a production environment, a candidate storage solution must demonstrate high performance in both speed and throughput, excelling in handling large volumes of data efficiently and effectively.

The latency requirement led Lyrebird to revise and expand its initial infrastructure design shown in Figure 1 to reduce the time taken to perform a single inference and make a new multiple inference pipeline available to mobile applications. Lyrebird identified Amazon S3 Express One Zone as a potential solution, as it offers up to 10x faster data access and up to 50% lower request costs than S3 Standard, and set out to evaluate it through a set of performance tests.

Using Amazon S3 Standard as its testing baseline, Lyrebird constructed an experimental setup to simulate the production environment, enabling multi-inference using three EC2 Auto Scaling groups. The first Auto Scaling group (p0) generates a random file, uploads it to a given storage path, and periodically sends a message containing the file path to an Amazon SQS queue. The second Auto Scaling group (p1) polls the queue, uses the request details in the message to download the generated file from storage, simulates the inference process, and uploads it back to a different location in object storage. The last Auto Scaling group (p2) downloads the file, simulates a second inference on it, and then uploads it to a bucket.

In these experiments, the aim was to understand the limits of Amazon S3 within Lyrebird’s modular service design. In this scenario, 90 p0 nodes generate 18,000 requests per minute. This workload is managed by 180 p1 nodes and 180 p2 nodes. The tests ran for approximately 20 minutes, resulting in over 1 million requests being processed from p0 to p2 nodes. Since there were thousands of requests, the plotted test results used a log scale on the Y-axis.

Figure 2 shows the parameters used in this experiment. The following naming convention was used in the tests:

up_p0: Total duration for uploading data to S3 from p0.
down_p1: Total duration for downloading data from S3 from p1.
up_p1: Total duration for uploading data to S3 from p1.
down_p2: Total duration for downloading data from S3 from p2.
up_p2: Total duration for uploading data to S3 from p2.
p1_p2: Duration between the end of p1’s upload to S3 and the start of p2’s download from S3.
total: Duration between the end of p2’s upload to S3 and the start of p1’s download from S3.

Figure 2 Parameters for data transfer benchmark test

Figure 2: Parameters for data transfer benchmark test

Testing S3 Standard first, a 1 MB file was generated by 30 Amazon EC2 instances in p0. Each of those instances generated 10 files a second, and requests were sent to the Amazon SQS queue every 100 ms. The other two Auto Scaling groups, with 180 EC2 instances per group, processed the images.

Upload times for a 1 MB file were between 290 ms and 500 ms, whereas download took around 500 ms. The results are presented in Figure 3. The histograms reveal that while most operations (uploads and downloads) were completed quickly, there were outliers that took significantly longer. The skewness in the distributions suggests that performance can vary considerably. As can be seen in the bottom right graph, the total duration of an image process including two inferences took around 1.6 seconds (excluding inference times), which was not acceptable for Lyrebird’s goal to serve users with the lowest possible response time. One other problem was the skewness of the graph or the high median, which could lead to unstable performance across mobile applications. This highlights a critical area for improvement in optimizing both the network and the S3 configurations to meet Lyrebird’s stringent performance criteria.

Figure 3 Results of S3 Standard tests showing response time of 1662 ms

Figure 3: Results of S3 Standard tests showing response time of 1662 ms

Lyrebird then conducted the same experiment using S3 Express One Zone instead of S3 Standard. With S3 Express One Zone offering up to 10x faster data access, results were promising: up to a 95% decrease in download times and up to an 82% decrease in upload times. As shown in Figure 4, a decrease in image retrieval and upload times decreased the amount of time taken between the end of the p2 group’s upload and the start of the p1 group’s download to 321 ms, which is 80% faster than the previous setup with S3 Standard. Another observation from the results of the Express One Zone experiment was that the graph for the total duration of a multi-inference request had a lower median, indicating that the completion interval of the request was much narrower than in the previous experiment.

Figure 4 Results of S3 Express One Zone tests showing a response time of 321 ms

Figure 4: Results of S3 Express One Zone tests showing a response time of 321 ms

Lyrebird revised its infrastructure so that its inference pipeline uses S3 Express One Zone as a storage layer. It can now perform multiple inferences on an input image and provide users with higher quality, higher resolution, and higher context images without read and write latencies impacting the user experience. Figure 5 shows the revised architecture with S3 Express One Zone.

Figure 5 Revised production architecture with S3 Express One Zone for storing images

Figure 5: Revised production architecture with S3 Express One Zone for storing images

Results

Lyrebird observed a remarkable improvement in its production performance using S3 Express One Zone, with image retrieval times decreasing by 95% and upload times decreasing by 82%. In addition to the performance improvements, S3 Express One Zone also reduced storage costs by 35% via its request costs being up to 50% lower than S3 Standard. Moreover, the enhanced efficiency of the system has led to a significant reduction in EC2 Inf2 compute costs. Lyrebird has observed that its compute costs after moving its production workloads to S3 Express One Zone have decreased by 11%. This is due to the decreased running time of compute resources, resulting from faster retrieval and upload times, which in turn reduces the time accelerators are running image processing tasks.

There was some additional refactoring needed to be done, such as copying the uploaded image from the S3 general purpose bucket to an S3 directory bucket, which was an insignificant cost to consider given the cost and performance improvements made on the image retrievals and uploads from the EC2 instances that process the images. Copying the object allows Lyrebird to use S3 Event Notifications with S3 Standard for the event-driven processing that happens when mobile clients upload images.

Furthermore, a detailed analysis reveals significant long-term savings. The move to Amazon S3 Express One Zone projects to save Lyrebird 18% in TCO, considering factors such as cost avoidance from reduced compute resource usage and reduced storage costs. Lyrebird’s strategic investment in Express One Zone thus demonstrates significant long-term value, optimizing infrastructure while enhancing service delivery capabilities.

Conclusion

In this post, we discussed the latency and throughput improvements Lyrebird Studios achieved using the S3 Express One Zone storage class for generative AI image-to-image workloads. Lyrebird Studio integrated S3 Express One Zone to address its low latency requirements, leading to significant improvements in both performance and cost efficiency and allowing it to handle millions of daily requests more efficiently.

For Lyrebird, S3 Express One Zone helps ensure that millions of mobile users experience minimal downtime and enjoy a responsive service every time they use a Lyrebird mobile application. Additionally, the cost savings of 35% and 11% for storage and compute, respectively, and the overall 18% reduction in TCO have allowed Lyrebird to re-allocate resources to new projects and further improving its user experience in other areas.

Lyrebird Studio uses AI to transform users’ everyday lives into a creative adventure. You can find out more at lyrebirdstudio.net. You can also get started with Amazon S3 Express One Zone by reviewing the S3 User Guide.

AWS Storage Blog

Lyrebird improves performance and reduces costs for generative AI workloads using Amazon S3 Express One Zone

Understanding Lyrebird’s storage needs

Lyrebird’s solution and revised architecture

Results

Conclusion

Resources

Follow