AWS Cloud Operations Blog
Category: Compute
Enable cloud operations workflows with generative AI using Agents for Amazon Bedrock and Amazon CloudWatch Logs
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible […]
Serverless Governance of Software Deployed with AWS Service Catalog
AWS Service Catalog (Service Catalog) is a powerful tool that empowers organizations to manage and govern approved services and resources. It significantly benefits platform engineering by standardizing environments, accelerating service delivery, and enhancing security. With its automated provisioning and resource management, Service Catalog supports infrastructure as code, enabling scalable, reliable deployments. Platform engineering teams are […]
Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock
As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]
Visualize AWS Systems Manager Patch Manager information using Amazon QuickSight
In this blog post, learn how to build an Amazon QuickSight dashboard to visualize critical patch and inventory information to speed up MTTR. Also, you can use filters to search for a specific AWS Account, specific AWS Region, Amazon Elastic Compute Cloud (Amazon EC2) name, or check installed/missed packages. You want to visualize system patching […]
Leverage Amazon Q to upgrade Lambda runtime functions
Cloud operations are at the heart of every organization. Operating in the cloud allows IT teams to focus on business outcomes, optimizing IT processes while accelerating software development and innovation. These days, it is no longer a question if your organization is moving to the cloud, but how quickly you can move with security and […]
Use AWS Systems Manager Automation runbooks to resolve Elastic Block Store related operational tasks
Customers have been using various forms of automation for years to define a sequence of actions on Amazon Elastic Block Store (EBS). While before, customers were facing operational overhead related to EBS tasks, AWS Systems Manager (SSM) Automations can now be leveraged to meet a wide variety of customer use cases. In this blog post, a […]
Automate your Multicloud operations with AWS Systems Manager and AWS Lambda
A multicloud strategy presents various challenges, including observing and managing applications and infrastructure across multiple cloud platforms. Maintaining consistent tooling for visualizing operational data and automating actions helps organizations address this challenge. Amazon CloudWatch and AWS Systems Manager are two services that provide unified monitoring, observability, and automation capabilities for workloads deployed on AWS, on-premises, […]
Developing an AWS Service Catalog self-managed engine for governance
AWS Service Catalog lets you centrally manage your cloud resources to achieve governance at scale of your Infrastructure as Code (IaC) templates. AWS Service Catalog supports AWS CloudFormation natively and allows customers to use other IaC such as Terraform Community and Terraform Cloud via Service Catalog reference engine. We often hear customers asking how to […]
How to perform Failover and Failback using AWS Elastic Disaster Recovery (AWS DRS) between VMware and AWS environments
Enterprises face a variety of threats such as natural disasters, cyber-attacks and technology failures that could severely disrupt operations. A comprehensive disaster recovery plan is crucial to quickly respond and recover from these events. In this blog post, we’ll show how to plan and implement a comprehensive disaster recovery solution between your VMware on-premises environment […]
Understanding AWS High Availability and Replication for vSphere Administrators
Introduction vSphere HA is a fundamental and frequently used feature of vSphere. If any of several failure scenarios occur, it restarts a virtual machine. The failure scenarios range from VM or host crashes to unresponsive hosts (for example, due to network isolation or outage). Translating vSphere High Availability (HA) to the public cloud can be […]