Overview
The AWS Glue Connector for Apache Hudi simplifies the process to create and update Apache Hudi tables from AWS Glue. This connector can be used for both Copy on Write (COW) and Merge on Read (MOR) storage types.
Highlights
- Helps to Create both CoW and MoR Apache Hudi tables in AWS Glue Data Catalog
- Automatically adds partitions when new data is added to a partitioned table
Details
Pricing
Vendor refund policy
This is a placeholder value. Please update this value via the AWS Marketplace Management Portal.
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Glue 3.0
- Amazon ECS
- Amazon EKS
Container image
Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.
Version release notes
Apache Hudi Connector 0.10.1-2 for AWS Glue.
- This version is built with hudi 0.10.1.
- This version is compatible with AWS Glue 3.0.
Additional details
Usage instructions
Please subscribe to the product from AWS Marketplace and Activate the Glue connector from AWS Glue Studio .
How to use the connector
You can use the Hudi connector in following ways.
- DynamicFrame with connection options
- DataFrame (i.e. spark.read, df.write)
See details: Apache Hudi document
Connection options
You can pass the following options to the connector.
- path (required): The data location on S3.
Job configurations
You need to pass the following job config.
- Dependent JARs path (--extra-jars): /tmp/*
IAM configuration
To use this marketplace connector, your Glue ETL job needs additional permissions attached.
See details: Permissions required for using connectors
VPC configuration
VPC jobs
To use this marketplace connector from your VPC jobs, you need to satisfy following conditions.
- Configure the network options of the Glue connection with your VPC, the private subnet, and the security group.
- Configure a route table of the private subnet to route traffic to NAT Gateway. This is required because the job needs to download the marketplace container image from ECR repository.
See details: Configure a VPC for your ETL job
Non VPC jobs
To use this marketplace connector from your non-VPC jobs, you do not need to add VPC configuration on the Glue connection. You can leave network options blank.
Limitations
- Currently Apache Hudi Connector 0.10.1-2 for Glue 3.0 does not support Hudi MoR tables.
Resources
Support
Vendor support
Please allow 24 hours
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.