AWS Cloud Operations Blog

How Stripe architected massive scale observability solution on AWS

This post is co-written with Cody Rioux, Staff Engineer at Stripe and Michael Cowgill, Staff engineer at Stripe Stripe powers online and in-person payment processing and provides financial solutions for businesses of all sizes. Stripe operates a sophisticated microservice environment built on top of AWS. In this blog post we will cover the journey and […]

Strengthen application resilience with myApplications and AWS Resilience Hub

Introduction Today, organizations prioritize managing their applications over infrastructure, focusing on business outcomes while leveraging automation and cloud services to handle the underlying infrastructure. They seek to consolidate key application metrics like health, security, cost, and performance from AWS services such as AWS Security Hub or Amazon CloudWatch. These organizations also need to ensure their […]

Sign-in to AWS Console Mobile Application with an AWS Access Portal or third-party IdP URL

AWS customers rely on the AWS Console Mobile Application to monitor, manage, and receive notifications to stay informed about their AWS resources while away from their desktop devices. Customers who use Single-Sign-On (SSO) can face a unique set of challenges while signing into the AWS Console Mobile Application. While SSO can offer enhanced security and […]

Exploring AWS Config data using Amazon Athena and Amazon Managed Grafana

This post is co-written with Jacob Rickerd, Principal Security Engineer at Attentive. The post walks through an example dashboard that Attentive, an AI-powered mobile marketing platform, uses for resource inventory, serving as a starting point for you to build comprehensive dashboards tailored to your environment and tag policies. Attentive is the AI-powered SMS and email […]

Support for Amazon CloudWatch Evidently ending soon

After careful consideration, we have made the decision to discontinue CloudWatch Evidently, effective 10/17/2025. Active customers will be able to use the service as normal until 10/17/2025, when support for the service will end. During this period, we will continue to provide critical security patches, but will no longer support any limit increase requests. On […]

Streamline change processes ­and improve governance with AWS Well-Architected

The AWS Well-Architected Framework (WA Framework) is designed to help cloud architects build secure, resilient, high-performing, and efficient workloads on AWS. It is structured around six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. Figure 1. The pillars of AWS Well-Architected Framework This post provides insights on how to streamline your change-management […]

Centralized monitoring and alerting for AWS Systems Manager Agent status on managed nodes across AWS Organization

Has the AWS Systems Manager Agent (SSM Agent) running on your critical servers on-premises or on Amazon Elastic Compute Cloud (Amazon EC2) lost healthy connection to AWS Systems Manager (SSM) for some reason and you wanted to be proactively notified when this happens? Do you wish to improve observability of your SSM Agent status and […]

Analyzing your custom metrics spend contributors in Amazon CloudWatch

With an ever-growing volume of custom metrics in Amazon CloudWatch, customers often find it difficult to understand and manage their spend on this service. One of the most common questions they have is how to identify which metrics contribute the most to their spend in CloudWatch. This blog post introduces a solution that lets you […]

Enhanced dashboard, latency suggestions in Amazon CloudWatch Internet Monitor

Amazon CloudWatch Internet Monitor provides near-continuous internet measurements for your internet traffic, including availability and performance metrics, tailored to your specific workload footprint on AWS. With Internet Monitor, you can get insights into average internet performance metrics over time, as well as get alerts for issues (health events). You’re notified about events that impact your end […]

Bootstrap your chaos engineering journey with AWS Fault Injection Service Scenarios Library

Ensuring the reliability and resilience of applications is crucial for maintaining business continuity, delivering a superior customer experience, and staying compliant with industry regulations. As defined in the AWS Well-Architected Framework Reliability Pillar, testing reliability plays an important role in ensuring reliability. Chaos engineering is a powerful way to not only test how your systems […]