AWS Cloud Operations Blog
Top Announcements for AWS Cloud Operations at re:Invent 2024
1. Transform operational investigations
Automation and intelligent operations are pivotal to building the future of cloud operations. To help you overcome growing complexity of modern software and make operations more intelligent and efficient, AWS introduced new generative-AI powered capabilities that can help you simplify routine tasks and accelerate operational investigations across your AWS environment.
Amazon Q Developer adds operational investigation capability (Preview)
Amazon Q Developer now helps you accelerate operational investigations across your AWS environment in just a fraction of the time. With a deep understanding of your AWS cloud environment and resources, Amazon Q Developer looks for anomalies in your environment, surfaces related signals for you to explore, identifies potential root-cause hypothesis, and suggests next steps to help you remediate issues faster. Amazon Q Developer works alongside you throughout your operational troubleshooting journey from issue detection and triaging, through remediation.
You can initiate an investigation by selecting the Investigate action on any Amazon CloudWatch data widget across the AWS console. CloudWatch now provides a dedicated investigation experience where teams can collaborate and add findings, view related signals and anomalies, and review suggestions for potential root cause hypotheses. This new capability also provides remediation suggestions for common operational issues across your AWS environment by surfacing relevant AWS Systems Manager Automation runbooks, AWS re:Post articles, and documentation.
AWS CloudTrail Lake announces two AI-powered capabilities
AWS announced two AI-powered enhancements to AWS CloudTrail Lake. These new capabilities simplify log analysis, enabling deeper insights and quicker investigations across your AWS environments. 1/ AI-powered natural language query generation in CloudTrail Lake is now generally available and allows you to ask questions about your AWS activity in plain English, without writing complex SQL queries. 2/ AI-powered query result summarization in preview provides natural language summaries of your query results, regardless of whether the query was generated through the natural language query generation feature or manually written in SQL. For example, after running a query to find users with the most access denied requests, you can click “Summarize” to get a concise overview of the key findings.
2. Transform how you govern
To transform cloud operations, you need to start with the right set of tools and governance frameworks. These tools should give you consistent visibility and enable you to prevent unwanted and noncompliant actions. Furthermore, you should be able to apply controls easily and at scale, so you can prevent drift and improve your security and compliance posture.
Node management experience in AWS Systems Manager
The new node management experience in AWS Systems Manager helps you scale operational efficiency by simplifying node management, making it easier to manage nodes running anywhere— whether it’s EC2 instances, hybrid servers, or servers running in a multicloud environment. You now have a comprehensive, centralized view to easily manage all your nodes at scale, and you can also identify, diagnose, and remediate unmanaged nodes. Systems Manager is also now integrated with Amazon Q Developer which extends your ability to see and control your nodes from anywhere in the AWS console.
Resource control policies
Resource control policies (RCPs) in AWS Organizations is a type of preventative control that helps you centrally establish a data perimeter across your AWS environment. With RCPs, you can centrally restrict external access to your AWS resources at scale. For example, an RCP can help enforce the requirement that “no principal outside my organization can access Amazon S3 buckets in my organization,” regardless of the permissions granted through individual bucket policies. RCPs complement service control policies (SCPs), an existing type of organization policy. While SCPs offer central control over the maximum permissions for IAM roles and users in your organization, RCPs offer central control over the maximum permissions on AWS resources in your organization. You can deploy RCP-based configurable managed controls using AWS Control Tower too.
Declarative Policies
Declarative Policies is another new type of preventative control in AWS Organizations. These policies simplify the way customers enforce durable intent, such as baseline configuration for AWS services within their organization. Declarative policies are designed to prevent actions that are non-compliant with the policy. For example, customers can use declarative policies to configure EC2 to only allow instance launches using AMIs vended by specific providers, or block public access in their VPC for their entire organization. The configuration defined in the declarative policy is maintained even when services add new APIs or features, or when customers add new principals or accounts to their organization. Declarative policies today support EC2, EBS and VPC configurations. AWS also announced managed controls implemented using Declarative Policies in AWS Control Tower.
Enhanced event selectors on AWS CloudTrail Lake
AWS enhances event filtering in AWS CloudTrail Lake, which is a managed data lake that helps you capture, immutably store, access, and analyze your activity logs, as well as AWS Config configuration items. Enhanced event filtering expands upon existing filtering capabilities, giving you even greater control over which CloudTrail events are ingested into your event data stores. This enhancement increases the efficiency and precision of your security, compliance, and operational investigations while helping reduce costs.
Centralized resource context and quick actions in AWS Resource Explorer
AWS announced the general availability of a new console experience in AWS Resource Explorer that centralizes resource insights and properties from AWS services. With this release, you now have a single console experience to use simple keyword-based search for your AWS resources, view relevant resource properties, and confidently take action to organize your resources. You can now inspect resource properties, resource-level cost with AWS Cost Explorer, AWS Security Hub findings, AWS Config compliance and configuration history, event timelines with AWS CloudTrail, and a relationship graph showing connected resources.
3. Transform how you observe
To operate efficiently at any scale, observability is business-critical. You need visibility to act decisively, identify root causes of issues quickly, and operate more efficiently. AWS announced new capabilities to help you observe your applications, infrastructure, networks, databases, and containers. Highlights include:
Amazon CloudWatch adds context to observability data in service console, accelerating analysis
Amazon CloudWatch now adds context to observability data, making it much easier for IT operators, application developers, and Site Reliability Engineers (SREs) to navigate related telemetry, visualize relationships between resources, and accelerate analysis. This new feature transforms disparate metrics and logs into near real-time insights, to identify root cause of issues faster and improve operational efficiency. With this feature, Amazon CloudWatch now automatically visualizes the relationships within observability data and underlying AWS resources, such as Amazon EC2 instances and AWS Lambda functions.
Amazon CloudWatch adds network performance monitoring for AWS workloads using flow monitors
Amazon CloudWatch Network Monitoring now allows you to monitor network performance of your AWS workloads by using flow monitors. The new feature provides near real-time visibility of network performance for workloads between compute instances such as Amazon EC2 and Amazon EKS, and AWS services such as Amazon S3, Amazon RDS, and Amazon DynamoDB, enabling you to rapidly detect and attribute network-driven impairments for your workloads. CloudWatch Network Monitoring uses flow monitors to provide TCP-based performance metrics for packet loss and latency, and network health indicators of your AWS workloads to help you quickly pinpoint the root cause of issues.
Amazon CloudWatch Database Insights for Amazon Aurora
AWS now provides Amazon CloudWatch Database Insights with support for Amazon Aurora PostgreSQL and Amazon Aurora MySQL. Database Insights is a database observability solution that provides a curated experience designed for DevOps engineers, application developers, and database administrators (DBAs) to expedite database troubleshooting and gain a holistic view into their database fleet health. Database Insights consolidates logs and metrics from your applications, your databases, and the operating systems on which they run into a unified view in the console.
Amazon CloudWatch Container Insights launches enhanced observability for Amazon ECS
Amazon CloudWatch Container Insights introduces enhanced observability for Amazon Elastic Container Service (ECS) running on Amazon EC2 and Amazon Fargate with out-of-the-box detailed metrics, from cluster level down to container level to deliver faster problem isolation and troubleshooting. Enhanced observability enables customers to visually drill up and down across various container layers and directly spot issues like memory leaks in individual containers, reducing mean time to resolution.
Amazon CloudWatch launches observability solutions for AWS services and workloads on AWS
Observability solutions help you get up-and-running faster with infrastructure and application monitoring at AWS. Using observability solutions, you can select from a catalog of available solutions that deliver focused observability guidance for AWS services and common workloads such as Java Virtual Machine (JVM), Apache Kafka, Apache Tomcat, or NGINX. Solutions cover monitoring tasks including installing and configuring Amazon CloudWatch agent, deploying pre-defined custom dashboards and setting metric alarms.
Amazon CloudWatch adds centralized visibility into telemetry configurations
Amazon CloudWatch now offers centralized visibility into critical AWS service telemetry configurations, including Amazon VPC Flow Logs, Amazon EC2 Detailed Metrics, and AWS Lambda Traces. This enhanced visibility enables central DevOps teams, system administrators, and service teams to identify potential gaps in their infrastructure monitoring setup. The telemetry configuration auditing experience seamlessly integrates with AWS Config to discover AWS resources, and can be turned on for the entire organization using the new AWS Organizations integration with Amazon CloudWatch.
AWS Fault Injection Service now generates experiments reports
AWS Fault Injection Service (AWS FIS) now generates reports for experiments, reducing the time and effort to produce evidence of resilience testing. The report summarizes experiment actions and captures application response from a customer-provided Amazon CloudWatch Dashboard. With AWS FIS, you can run fault injection experiments to create realistic failure conditions under which to practice your disaster recovery and failover tests.
4. Transform how you analyze
It can be time-consuming to analyze performance issues or pinpoint root causes from raw operational data. You need scalability to store all the raw data, query engines to index and analyze the data, all without having to copy the data between different systems. Read on to learn how AWS is improving search and analytics experience with new capabilities in Amazon CloudWatch and Amazon OpenSearch Search, along with zero-ETL integrations, so you can have the best of AWS solutions to enhance observability and analysis.
Amazon CloudWatch Application Signals with complete visibility into application transaction spans
AWS announces the general availability of an enhanced search and analytics experience in CloudWatch Application Signals. This feature empowers developers and on-call engineers with complete visibility into application transaction spans, which are the building blocks of distributed traces that capture detailed interactions between users and various application components. Developers can answer any questions related to application performance or end-user impact through an interactive visual editor and enhancements to Logs Insights queries. CloudWatch Logs offers advanced features for transaction spans, including data masking, forwarding via subscription filters, and metric extraction.
Amazon OpenSearch Service zero-ETL integration with Amazon CloudWatch
Amazon Web Services announces a new integrated analytics experience and zero-ETL integration between Amazon CloudWatch and Amazon OpenSearch Service for customers to get the best of both services. CloudWatch customers can now leverage OpenSearch’s Piped Processing Language (PPL) and OpenSearch SQL. Additionally, CloudWatch customers can accelerate troubleshooting with out-of-the-box curated dashboards for vended logs like Amazon Virtual Private Cloud (VPC), AWS CloudTrail, and AWS Web Application Firewall (WAF). OpenSearch customers can now analyze CloudWatch Logs without having to duplicate data.
Amazon OpenSearch Service zero-ETL integration with Amazon Security Lake
Amazon OpenSearch Service now offers a zero-ETL integration with Amazon Security Lake, enabling you to query and analyze security data in-place directly through OpenSearch. This integration allows you to efficiently explore voluminous data sources that were previously cost-prohibitive to analyze, helping you streamline security investigations and obtain comprehensive visibility of your security landscape. By offering the flexibility to selectively ingest data and eliminating the need to manage complex data pipelines, you can now focus on effective security operations while potentially lowering your analytics costs.
Next-gen Amazon OpenSearch Service UI for enhanced data exploration and collaboration
Amazon OpenSearch Service launches a modern operational analytics experience that enables users to gain insights into data spanning managed domains and serverless collections from a single endpoint. The new OpenSearch analytics experience helps users gain insights from their operational data by providing purpose-built features for observability, security analytics, essentials and search use cases. The launch also includes Workspaces to enhance collaboration and productivity, allowing teams to create dedicated spaces. Users can access the latest UI enhancements, regardless of version of underlying managed cluster or collection.
AWS CloudTrail Lake launches enhanced analytics and cross-account data access
AWS announces two significant enhancements to AWS CloudTrail Lake: 1/. Comprehensive dashboard capabilities, including a new “Highlights” dashboard provides an at-a-glance overview of your AWS activity logs including AI-powered insights (AI-powered insights is in preview). 2/. Cross-account sharing of event data stores, a feature that allows you to securely share your event data stores with select IAM identities using Resource-Based Policies (RBP).
Conclusion
At re:Invent 2024, AWS announced powerful new capabilities for to help you make operations more secure, agile, efficient, and intelligent. With new AI-powered capabilities, such as those available in Amazon Q Developer in preview and in Amazon CloudTrail Lake, you can accelerate operational investigations across your AWS environment. You can improve your foundation with governance capabilities such as enhanced node management in Systems Manager, and new preventative controls with Declarative Policies and resource control policies. To address observability challenges, AWS introduced new CloudWatch features, such as adding context to telemetry, monitor network flows, database insights for Amazon Aurora, enhanced observability for ECS, etc. With application transaction spans in CloudWatch, new Zero ETL integrations, and improvements in OpenSearch Service, you can transform how you analyze operational and security data. Finally, AWS is connecting capabilities and services, removing the need for you to build context, copy data, manage policies, maintain ETL pipelines, and more; giving you a more integrated and unified experience so you can focus on innovating and delivering useful applications to your customers.