AWS Cloud Operations & Migrations Blog

Category: Technical How-to

Gain operational insights for NVIDIA GPU workloads using Amazon CloudWatch Container Insights

As machine learning models grow more advanced, they require extensive computing power to train efficiently. Many organizations are turning to GPU-accelerated Kubernetes clusters for both model training and online inference. However, properly monitoring GPU usage is critical for machine learning engineers and cluster administrators to understand model performance and to optimize infrastructure utilization. Without visibility […]

Automate CloudWatch Dashboard creation for your AWS Elemental Mediapackage and AWS Elemental Medialive

Introduction Monitoring the health and performance of your media services is critical to ensuring a seamless viewing experience for your customers. Amazon CloudWatch provides powerful monitoring capabilities for Amazon Web Services (AWS) resources. Setting up comprehensive dashboards can be a time-consuming process, especially for organizations managing large number of resources across multiple regions. The Automatic CloudWatch […]

Testing Amazon Cognito backed APIs using Amazon CloudWatch Synthetics

Testing Amazon Cognito backed APIs using Amazon CloudWatch Synthetics

Customers who develop APIs can control access to them using Amazon Cognito user pools as an authorizer. Testing these APIs should take into account the additional security controls in place to effectively validate that the APIs are working, and Amazon CloudWatch Synthetics enables proactive testing of these APIs. If you are using Amazon Cognito User […]

How SLAs, SLOs, and SLIs interact

Improve application reliability with effective SLOs

At AWS, we consider reliability as a capability of services to withstand major disruptions within acceptable degradation parameters and to recover within an acceptable timeframe. Service reliability goes beyond traditional disciplines, such as availability and performance, to achieve its goal. Components of a system or application will eventually fail over time. Like our CTO Werner Vogels […]

Enhancing observability with a managed monitoring solution for Amazon EKS

Enhancing observability with a managed monitoring solution for Amazon EKS

Introduction Keeping a watchful eye on your Kubernetes infrastructure is crucial for ensuring optimal performance, identifying bottlenecks, and troubleshooting issues promptly. In the ever-evolving world of cloud-native applications, Amazon Elastic Kubernetes Service (EKS) has emerged as a popular choice for deploying and managing containerized workloads. However, monitoring Kubernetes clusters can be challenging due to their […]

Auditing generative AI workloads with AWS CloudTrail

With the emergence of generative AI being incorporated into every aspect of how we utilize technology, a common question that customers are asking is how to properly audit generative AI services on AWS, such as Amazon Bedrock, Amazon Sagemaker, Amazon Q Developer, and Amazon Q Business. In this post, we will demonstrate common scenarios that […]

Augmenting mainframe data with IBM MQ and Amazon Managed Streaming for Apache Kafka

Introduction In this post, we explore the approach of integrating mainframe IBM MQ with Amazon Managed Streaming for Apache Kafka (Amazon MSK), to migrate your applications into a cloud-based consumer model. Amazon MSK is a fully managed Apache Kafka service from AWS that makes it simpler to set up and operate Kafka in the cloud. […]

How to automate application log ingestion from Amazon EKS on Fargate into AWS CloudTrail Lake

How to automate application log ingestion from Amazon EKS on Fargate into AWS CloudTrail Lake

Customers often look for options to capture and centralized storage of application logs from Amazon Elastic Kubernetes Service on Fargate (Amazon EKS on Fargate) Pods to investigate root causes or analyze security incidents. Customers also like the capability to easily query the logs to assist with security investigations. In this blog post, we show you […]

Monitor Java apps running on Tomcat server with Amazon CloudWatch Application Signals (Preview)

Traditionally, Java web applications are packaged into Web Application Resource (WAR) files, which can be deployed on any Servlet/JSP container like Tomcat server. These applications often operate within distributed environments, involving multiple interconnected components such as databases, external APIs, and caching layers. Monitoring the performance and health of Java web applications can be challenging due […]

Securely administer servers migrated with AWS Application Migration Service using AWS Systems Manager Session Manager

Securely administer servers migrated with AWS Application Migration Service using AWS Systems Manager Session Manager

Introduction In this blog post, we will illustrate how to automate the configuration necessary to manage migrated servers with improved security and reduced costs. To administer servers in an on-premises environment, administrators often use secure shell (SSH) or Remote Desktop Protocol (RDP) to connect. After migrating to Amazon Web Services (AWS), this may not be […]