AWS Cloud Operations Blog
Bootstrap your chaos engineering journey with AWS Fault Injection Service Scenarios Library
Ensuring the reliability and resilience of applications is crucial for maintaining business continuity, delivering a superior customer experience, and staying compliant with industry regulations. As defined in the AWS Well-Architected Framework Reliability Pillar, testing reliability plays an important role in ensuring reliability. Chaos engineering is a powerful way to not only test how your systems […]
Managing access to AWS accounts from Microsoft Teams and Slack at scale using AWS Organizations and AWS Chatbot
Customers use chat collaboration applications like Microsoft Teams and Slack to collaborate and manage their AWS applications. AWS Chatbot is a ChatOps service that enables customers to monitor, troubleshoot issues, and manage AWS applications from chat channels. AWS Chatbot provides autonomy and customizability to DevOps teams operating their AWS environments on the go from chat […]
Accelerating migrations and IT Tasks for DKB using AWS Systems Manager
Deutsche Kreditbank AG (DKB), one of Germany’s largest direct banks with over five million customers. In 2023, DKB migrated their back-office IT infrastructure to Amazon Web Services (AWS). This Included their diverse infrastructure, backup, networking, and both Windows and Linux servers, while managing risks like downtime, data integrity, and security vulnerabilities. Customers in regulated industries […]
Centrally detect and investigate security findings with AWS Organizations integrations
Detecting security risks and investigating the corresponding findings is essential for protecting your AWS environment from potential threats, ensuring the confidentiality, integrity, and availability of your data and resources for your business needs. AWS provides a range of governance and security services such as AWS Organizations, AWS Control Tower, and AWS Config along with many others, […]
Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers
Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]
Ingesting administrative logs from Microsoft Azure to AWS CloudTrail Lake
In January 2023, AWS announced the support of ingestion for activity events from non-AWS sources using CloudTrail Lake. Making CloudTrail Lake a single location of immutable user and API activity events for auditing and security investigations. AWS CloudTrail Lake is a managed data lake for capturing, storing, accessing, and analyzing user and API activity on […]
Enable cloud operations workflows with generative AI using Agents for Amazon Bedrock and Amazon CloudWatch Logs
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible […]
Ten features for efficiently managing your AWS applications from Microsoft Teams and Slack using AWS Chatbot
Ten features in AWS Chatbot to help you understand your application health and resolve issues faster from chat channels.
Serverless Governance of Software Deployed with AWS Service Catalog
AWS Service Catalog (Service Catalog) is a powerful tool that empowers organizations to manage and govern approved services and resources. It significantly benefits platform engineering by standardizing environments, accelerating service delivery, and enhancing security. With its automated provisioning and resource management, Service Catalog supports infrastructure as code, enabling scalable, reliable deployments. Platform engineering teams are […]
Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock
As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]