Customer Stories / Software & Internet / India

2024
Observe.AI

Observe.AI Cuts Costs by Over 50% with Machine Learning on AWS

Observe.AI developed and open-sourced the One Load Audit Framework on AWS to optimize machine learning model costs, boost developer efficiency, and scale to meet data growth.

50%

lower costs by fine-tuning instance sizes

10x

higher data loads supported

From weeks to hours

Reduced development time

Overview

Observe.AI uses conversation intelligence to uncover insights from live and post-customer interactions, helping companies increase contact center agent performance. The company developed and open-sourced the One Load Audit Framework (OLAF), which integrates with Amazon SageMaker to automatically find bottlenecks and performance problems in machine learning services.

Using OLAF to load-test Amazon SageMaker instances, Observe.AI reduced machine learning costs by over 50 percent, lowered development time from one week to hours, and facilitated on-demand scaling to support a tenfold growth in data load size.

Observe.AI Case Study

Opportunity | Predicting ML Data Load Sizes for Enhanced Efficiency

Observe.AI optimizes the customer experience through an artificial intelligence (AI)-powered workforce platform. Employing a large language model (LLM) designed for contact centers, Observe.AI enhances contact center agent performance and extracts insights from customer interactions using conversation intelligence. Each month, the platform processes millions of conversations and generates hundreds of inferences per conversation.

As machine learning (ML) adoption continues to grow across industries, testing the performance of customers’ ML services under varying data loads has become increasingly crucial for Observe.AI. Aashraya Sachdeva, staff engineer in machine learning at Observe.AI, says, "While onboarding new customers, we were assessing our ML system's capability to handle a tenfold increase in data load, corresponding to the tenfold rise in conversations processed daily. Our ML engineers and scientists faced challenges in accurately predicting this capability when transitioning models from research to production."

The company sought to deploy a larger ML model in production for enhanced accuracy. Simultaneously, there was a careful effort to manage latency and control costs associated with the implementation. Achieving an optimal return on investment through fine-tuning its infrastructure was key, and the business wanted a solution compatible with its existing Amazon Web Services (AWS) environment.

"We sought a more straightforward method to identify the optimal infrastructure, assess our readiness for increased load, and determine the associated costs for serving code to customers. We also wanted precise insights into the developer time required for implementation," Aashraya explains.

kr_quotemark

Through fine-tuning Amazon SageMaker instance sizes with OLAF while maintaining a constant data input load, we optimized costs for our LLM deployment by over 50 percent. This process ensured the best return on investment.”

Aashraya Sachdeva
Staff Engineer, Machine Learning at Observe.AI

Solution | Building the One Load Audit Framework on AWS

To address its challenge of predicting ML load sizes, Observe.AI created and open-sourced the One Load Audit Framework (OLAF). Integrated with Amazon SageMaker, a service that builds, trains, and deploys ML models for any use case, OLAF identifies bottlenecks and performance issues in ML services, offering latency and throughput measurements under both static and dynamic data loads. The framework also seamlessly incorporates ML performance testing into the software development lifecycle, facilitating accurate provisioning and cost savings.

Aashraya explains, "OLAF provides our ML engineers and scientists with a plug-and-play model. They simply input their AWS credentials and the Amazon SageMaker endpoint, and the tool conducts load testing, providing latency numbers and expected errors for a particular model or instance."

Following the initial build, Observe.AI integrated Amazon SageMaker features into OLAF, including multi-container deployment and batch inferencing. "We wanted to understand how these incremental features affect scalability in terms of cost," adds Aashraya. Next, the company incorporated Amazon Simple Queue Service (Amazon SQS), a fully managed message queuing service for microservices, distributed systems, and serverless applications. By downloading Amazon SQS load traces, OLAF users can observe the rate at which ML messages enter the system to predict data load size. Aashraya notes, "This feature assists us in easily testing queue-based array processing systems, which are becoming more prevalent."

Finally, Observe.AI integrated Amazon Simple Notification Service (Amazon SNS), a fully managed service for application-to-application and application-to-person messaging that helps OLAF users replicate specific patterns within Amazon SNS.

Outcome | Optimizing Costs and Boosting Developer Efficiency

Launched in 2022, OLAF by Observe.AI is now actively employed by dozens of ML engineers and researchers for testing and predicting data loads. By using OLAF, Observe.AI has cut LLM costs by conducting load tests on Amazon SageMaker instances, identifying the most suitable configuration aligned with the company’s business metrics. Aashraya explains, "Our research team encountered higher costs than anticipated when deploying an LLM, as well as other ML models, with specific latency and throughput requirements into production. However, through fine-tuning Amazon SageMaker instance sizes with OLAF while maintaining a constant data input load, we optimized costs for our ML models deployments by over 50 percent. This process ensured the best return on investment."

Previously, Observe.AI developers had to write multiple scripts and construct numerous pipeline workflows, resulting in a complex array of onboarding data transfers and debugging systems. Aashraya notes. "Because OLAF is tightly integrated with AWS, it now only takes developers a few hours to determine the proper instance for use, a task that used to take one week. As a result, developers can allocate more time to testing data loads and creating new features."

With the integration of OLAF, Observe.AI can scale its services to accommodate a tenfold increase in data load. The company can now conduct stress testing more easily and accurately, providing valuable assistance to customers who have augmented their data loads. Aashraya explains, "If a customer doubles their data load, we now have a clearer understanding of our infrastructure's capacity. Using OLAF and AWS, we can replicate and precisely increase the load by 100 percent, anticipating potential breakpoints or database issues. This not only helps us better prepare our customers for such scenarios but also brings internal cost and development benefits."

Learn More


About Observe.AI

Observe.AI is a solution for boosting contact center performance through live conversation intelligence. Utilizing a robust 30-billion-parameter contact center large language model (LLM) and a generative AI engine, Observe.AI extracts valuable insights from every customer interaction. Trusted by companies, Observe.AI is a valued partner in accelerating positive results across the entire business landscape.

AWS Services Used

Amazon SageMaker

Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) for any use case.

Learn more »

Amazon Simple Queue Service

Amazon Simple Queue Service (Amazon SQS) lets you send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available.

Learn more »

Amazon Simple Notification Service

Amazon Simple Notification Service (Amazon SNS) sends notifications two ways, A2A and A2P. A2A provides high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications.

Learn more »

More Software & Internet Customer Stories

Showing results: 17-20
Total results: 827

no items found 

  • United States

    Collaboration.Ai Achieves DoD Authorization in Less Than 90 Days, Saves $2 Million with Second Front, Chainguard, and AWS

    Collaboration.Ai, a leader in AI-driven innovation management software, sought to expand into the US federal government sector, particularly the Department of Defense (DoD), with its end-to-end CrowdVector innovation management platform. Obtaining the required Authority to Operate (ATO) on DoD networks—a complex, lengthy, and costly process—posed a significant challenge for the startup. By engaging with AWS Partner Second Front, a member of the AWS Global Security & Compliance Acceleration (GSCA) program and using its Game Warden DevSecOps platform on AWS GovCloud (US), Collaboration.Ai secured a Certificate to Field (CTF) for DoD networks in less than 90 days. The deployment was accelerated further by using a custom registry of secure, minimal container images from AWS Partner Chainguard. This approach saved nearly two years of work and $2 million in compliance costs, swiftly opening access to the US federal government market.

    2025
  • Israel

    Reducing Overall Costs by 50% and Improving Performance Using AWS Graviton with Logz.io

    Learn how Logz.io, a provider of AI-powered observability, improved efficiency and performance using AWS Graviton.
    2025
  • Americas

    Achieving Near-Zero Downtime and Powering Generative AI Using Amazon EKS with Ada

    As its self-managed clusters grew in size and complexity, software company Ada Support Inc. (Ada) significantly increased operational efficiency by migrating to Amazon EKS. Ada helps organizations resolve more customer inquiries with less effort using its customer-service-automation solution that is powered by artificial intelligence (AI). The company sought fully managed services to off-load the heavy lifting and reduce downtime for customers during upgrades. Using Amazon EKS, Ada empowers its engineers to focus less on maintenance and more on company improvements, such as increasing the deployment velocity by 70 percent and developing agents that are powered by generative AI.
    2025
  • Taiwan

    Noodoe Boosts EV Charging Station Revenues by 10–25% with Generative AI Advisor on Amazon Bedrock

    Learn how Noodoe uses generative AI on Amazon Bedrock to help EV charging station operators optimize pricing strategies and drive revenue growth.

    2025
1 207

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.