AWS Big Data Blog

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.

Accelerate your analytics with Amazon S3 Tables and Amazon SageMaker Lakehouse

Amazon SageMaker Lakehouse is a unified, open, and secure data lakehouse that now seamlessly integrates with Amazon S3 Tables, the first cloud object store with built-in Apache Iceberg support. In this post, we guide you how to use various analytics services using the integration of SageMaker Lakehouse with S3 Tables.

Build unified pipelines spanning multiple AWS accounts and Regions with Amazon MWAA

In this blog post, we demonstrate how to use Amazon MWAA for centralized orchestration, while distributing data processing and machine learning tasks across different AWS accounts and Regions for optimal performance and compliance.

Integrate ThoughtSpot with Amazon Redshift using AWS IAM Identity Center

In this post, we walk you through the process of setting up ThoughtSpot integration with Amazon Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

We’re excited to introduce a new enhancement to the search experience in Amazon SageMaker Catalog, part of the next generation of Amazon SageMaker—exact match search using technical identifiers. In this post, we demonstrate how to streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio.

Apache Flink + Prometheus

Process millions of observability events with Apache Flink and write directly to Prometheus

In this post, we explain how the new connector works. We also show how you can manage your Prometheus metrics data cardinality by preprocessing raw data with Flink to build real-time observability with Amazon Managed Service for Prometheus and Amazon Managed Grafana.

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

This post demonstrates how to implement reliable concurrent write handling mechanisms in Iceberg tables. We will explore Iceberg’s concurrency model, examine common conflict scenarios, and provide practical implementation patterns of both automatic retry mechanisms and situations requiring custom conflict resolution logic for building resilient data pipelines. We will also cover the pattern with automatic compaction through AWS Glue Data Catalog table optimization.

Optimize multimodal search using the TwelveLabs Embed API and Amazon OpenSearch Service

In this blog post, we show you the process of integrating TwelveLabs Embed API with OpenSearch Service to create a multimodal search solution. You’ll learn how to generate rich, contextual embeddings from video content and use OpenSearch Service’s vector database capabilities to enable search functionalities. By the end of this post, you’ll be equipped with the knowledge to implement a system that can transform the way your organization handles and extracts value from video content.

Correlate telemetry data with Amazon OpenSearch Service and Amazon Managed Grafana

In this post, we show you how to use Amazon OpenSearch Service and Amazon Managed Grafana to correlate the various observability signals that improve root cause analysis, thereby resulting in reduced Mean Time to Resolution (MTTR). We also provide a reference solution that can be used at scale for proactive monitoring of enterprise applications to avoid a problem before they occur.