AWS Cloud Operations Blog
Category: Amazon Managed Service for Prometheus
How Stripe architected massive scale observability solution on AWS
This post is co-written with Cody Rioux, Staff Engineer at Stripe and Michael Cowgill, Staff engineer at Stripe Stripe powers online and in-person payment processing and provides financial solutions for businesses of all sizes. Stripe operates a sophisticated microservice environment built on top of AWS. In this blog post we will cover the journey and […]
Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers
Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]
Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock
As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]
Enhancing observability with a managed monitoring solution for Amazon EKS
Introduction Keeping a watchful eye on your Kubernetes infrastructure is crucial for ensuring optimal performance, identifying bottlenecks, and troubleshooting issues promptly. In the ever-evolving world of cloud-native applications, Amazon Elastic Kubernetes Service (EKS) has emerged as a popular choice for deploying and managing containerized workloads. However, monitoring Kubernetes clusters can be challenging due to their […]
How StormForge reduces complexity and ensures scalability with Amazon Managed Service for Prometheus
This blog post was co-written by Brent Eager, Senior Software Engineer, StormForge StormForge is the creator of Optimize Live, a Kubernetes vertical rightsizing solution that is compatible with the Kubernetes HorizontalPodAutoscaler (HPA). Using cluster-based agents, machine learning, and Amazon Managed Service for Prometheus, Optimize Live is able to continuously calculate and apply optimal resource requests, […]
Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics
Introduction With the rising popularity of applications hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a key challenge is handling increases in traffic and load efficiently. Traditionally, you would have to manually scale out your applications by adding more instances – an approach that’s time-consuming, inefficient, and prone to over or under provisioning. A better […]
VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus
VTEX is a multi-tenant platform with a distributed engineering operation. Observing hundreds of services in real time in an efficient manner is a technical challenge for the business. In this blog, we will show how VTEX created a resilient open source-based architecture aligned with a sharding strategy, using Amazon Managed Service for Prometheus (AMP) to […]
How Unitary achieved automatic metric collection with Amazon Managed Service for Prometheus collector
This post was co-authored with Nicolas Fournier, Platform Engineer at Unitary. Every day, over 80 years’ worth of video content is uploaded online. Some of this content can also be harmful. Unitary knows that human moderators are the current gold standard for moderation, but this manual approach does not scale. While automated systems can scale, […]
Multi-tenant monitoring across accounts and regions using Amazon Managed Service for Prometheus
In this guest blog post, Nauman Noor (Managing Director), Fabio Dias (Cloud Developer), and Dylan Alibay (Cloud Developer) from the platform engineering team at State Street discuss their use of Amazon Managed Prometheus and AWS Distro for OpenTelemetry to enable monitoring in a multi-tenant, multi-account, and multi-region environment. In the ever-evolving financial services landscape, State […]
What’s new in AWS Observability at re:Invent 2023
Let’s recap the week at AWS re:Invent 2023 with a round-up of the AWS Observability launches across Amazon CloudWatch, Amazon Managed Grafana, and Amazon Managed Service for Prometheus. From automatic instrumentation and operation of applications in CloudWatch, to agentless scraping of Prometheus metrics in Managed Service for Prometheus, read on to learn about the features […]