AWS Big Data Blog
Category: Serverless
Jumia builds a next-generation data platform with metadata-driven specification frameworks
Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions.
Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless
Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and PPL and SQL, which give you new ways to query your data. In this post, we discuss the benefits of these new features and how to get started.
Enhance Amazon EMR scaling capabilities with Application Master Placement
Starting with the Amazon EMR 7.2 release, Amazon EMR on EC2 introduced a new feature called Application Master (AM) label awareness, which allows users to enable YARN node labels to allocate the AM containers within On-Demand nodes only. In this post, we explore the key features and use cases where this new functionality can provide significant benefits, enabling cluster administrators to achieve optimal resource utilization, improved application reliability, and cost-efficiency in your EMR on EC2 clusters.
Extract insights in a 30TB time series workload with Amazon OpenSearch Serverless
We recently announced a new capacity level of 30TB for time series data per account per AWS Region. The OpenSearch Serverless compute capacity for data ingestion and search/query is measured in OpenSearch Compute Units (OCUs), which are shared among various collections with the same AWS Key Management Service (AWS KMS) key. This post discusses how you can analyze 30TB time series datasets with OpenSearch Serverless.
Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake
In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. As organizations increasingly rely on data stored across various platforms, such as Snowflake, Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these […]
Deliver Amazon CloudWatch logs to Amazon OpenSearch Serverless
In this blog post, we will show how to use Amazon OpenSearch Ingestion to deliver CloudWatch logs to OpenSearch Serverless in near real-time. We outline a mechanism to connect a Lambda subscription filter with OpenSearch Ingestion and deliver logs to OpenSearch Serverless without explicitly needing a separate subscription filter for it.
Perform reindexing in Amazon OpenSearch Serverless using Amazon OpenSearch Ingestion
In this post, we outline the steps to copy data between two indexes in the same OpenSearch Serverless collection using the new OpenSearch source feature of OpenSearch Ingestion. This is particularly useful for reindexing operations where you want to change your data schema. OpenSearch Serverless and OpenSearch Ingestion are both serverless services that enable you to seamlessly handle your data workflows, providing optimal performance and scalability.
Analyze more demanding as well as larger time series workloads with Amazon OpenSearch Serverless
In today’s data-driven landscape, managing and analyzing vast amounts of data, especially logs, is crucial for organizations to derive insights and make informed decisions. However, handling this data efficiently presents a significant challenge, prompting organizations to seek scalable solutions without the complexity of infrastructure management. Amazon OpenSearch Serverless lets you run OpenSearch in the AWS […]
Run interactive workloads on Amazon EMR Serverless from Amazon EMR Studio
Starting from release 6.14, Amazon EMR Studio supports interactive analytics on Amazon EMR Serverless. You can now use EMR Serverless applications as the compute, in addition to Amazon EMR on EC2 clusters and Amazon EMR on EKS virtual clusters, to run JupyterLab notebooks from EMR Studio Workspaces. EMR Studio is an integrated development environment (IDE) […]
How the GoDaddy data platform achieved over 60% cost reduction and 50% performance boost by adopting Amazon EMR Serverless
This is a guest post co-written with Brandon Abear, Dinesh Sharma, John Bush, and Ozcan IIikhan from GoDaddy. GoDaddy empowers everyday entrepreneurs by providing all the help and tools to succeed online. With more than 20 million customers worldwide, GoDaddy is the place people come to name their ideas, build a professional website, attract customers, […]