AWS Big Data Blog

Category: Serverless

In-stream anomaly detection with Amazon OpenSearch Ingestion and Amazon OpenSearch Serverless

Unsupervised machine learning analytics has emerged as a powerful tool for anomaly detection in today’s data-rich landscape, especially with the growing volume of machine-generated data. In-stream anomaly detection offers real-time insights into data anomalies, enabling proactive response. Amazon OpenSearch Serverless focuses on delivering seamless scalability and management of search workloads; Amazon OpenSearch Ingestion complements this […]

Use Amazon OpenSearch Ingestion to migrate to Amazon OpenSearch Serverless

Amazon OpenSearch Serverless is an on-demand auto scaling configuration for Amazon OpenSearch Service. Since its release, the interest for OpenSearch Serverless had been steadily growing. Customers prefer to let the service manage its capacity automatically rather than having to manually provision capacity. Until now, customers have had to rely on using custom code or third-party […]

Use Amazon Athena with Spark SQL for your open-source transactional table formats

In this post, we show you how to use Spark SQL in Amazon Athena notebooks and work with Iceberg, Hudi, and Delta Lake table formats. We demonstrate common operations such as creating databases and tables, inserting data into the tables, querying data, and looking at snapshots of the tables in Amazon S3 using Spark SQL in Athena.

How FanDuel adopted a modern Amazon Redshift architecture to serve critical business workloads

This post is co-written with Sreenivasa Mungala and Matt Grimm from FanDuel. In this post, we share how FanDuel moved from a DC2 nodes architecture to a modern Amazon Redshift architecture, which includes Redshift provisioned clusters using RA3 instances, Amazon Redshift data sharing, and Amazon Redshift Serverless. About FanDuel Part of Flutter Entertainment, FanDuel Group […]

Introducing persistent buffering for Amazon OpenSearch Ingestion

Amazon OpenSearch Ingestion is a fully managed, serverless pipeline that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and OpenSearch Serverless collections. Customers use Amazon OpenSearch Ingestion pipelines to ingest data from a variety of data sources, both pull-based and push-based. When ingesting data from pull-based sources, such as Amazon Simple […]

Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting

Today, we are pleased to announce serverless Spark UI built into the AWS Glue console. You can now use Spark UI easily as it’s a built-in component of the AWS Glue console, enabling you to access it with a single click when examining the details of any given job run. There’s no infrastructure setup or teardown required. AWS Glue serverless Spark UI is a fully-managed serverless offering and generally starts up in a matter of seconds. Serverless Spark UI makes it significantly faster and easier to get jobs working in production because you have ready access to low level details for your job runs.

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Managing data within an organization is complex. Handling data from outside the organization adds even more complexity. As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. In this blog post, we’ll explore a […]

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it straightforward and cost-effective to analyze all your data at petabyte scale, using standard SQL and your existing business intelligence (BI) tools. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift. Amazon Redshift Serverless makes it effortless to run and […]

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

This post was co-written with Rajiv Arora, Director of Data Science Platform at Gilead Life Sciences. Gilead Sciences, Inc. is a biopharmaceutical company committed to advancing innovative medicines to prevent and treat life-threatening diseases, including HIV, viral hepatitis, inflammation, and cancer. A leader in virology, Gilead historically relied on these drugs for growth but now […]

GoDaddy benchmarking results in up to 24% better price-performance for their Spark workloads with AWS Graviton2 on Amazon EMR Serverless

This is a guest post co-written with Mukul Sharma, Software Development Engineer, and Ozcan IIikhan, Director of Engineering from GoDaddy. GoDaddy empowers everyday entrepreneurs by providing all the help and tools to succeed online. With more than 20 million customers worldwide, GoDaddy is the place people come to name their ideas, build a professional website, […]