AWS Partner Network (APN) Blog

Revolutionize data landscape with HCLTech’s Intelligent Ingestion solution for rapid ETL and beyond

By Chinmaya Ranjan Mohanty, Senior Solution Architect – HCLTech
By Subramanian Thiyagarajan, Technical Architect – HCLTech
By Sandeep Roy, Practice Director – HCLTech
By Partha Sarathi Das, Solution Architect – HCLTech
By Jerry Li, Shishir Choudhary, Sr. Partner Solution Architect – AWS

HCLTech-AWS-Partners-2022
HCLTech
Connect with HCLTech-2022

Introduction

Revolutionizing the data landscape, HCLTech introduces “Intelligent Ingestion” solution powered by AWS native services, streamlining rapid Extract, Transform and Load (ETL) processes and simplifying the entire data ingestion cycle.

Business Needs and Opportunities

In today’s data-driven world, there is a constant struggle in handling ever-growing number of data sources with each source having unique data formats and schema structure, making data ingestion more complex and time consuming. According to research and report, though the time spent on ETL has been dropped from 60-80% to 45% based on the surveys, data cleansing and collecting still takes considerable effort and time from data scientists.

HCLTech’s Intelligent Ingestion solution provides automated low-code to no-code approach, simplifying ETL build efforts for both batch and real-time data ingestion workloads. It is built using rich set of AWS services like AWS Step Functions, AWS Glue, AWS Glue DataBrew, AWS Lambda, AWS Lake Formation, Amazon Kinesis, Amazon S3, Amazon Simple Notification Service (SNS), etc., among other services to achieve seamless data integration, transformation and quality assurance. This solution entirely automates ETL ingestion upon a single click (event trigger) and provides reusable ETL workflows for rapid ETL development. It also brings quick actionable insights and decision making into business.

Intelligent Ingestion solution provides centralized data catalog that acts as a single source of truth for all data assets. It also allows various types of organizations who actively ingest data for ETL to manage and track changes to schemas (addition, deletion, modification etc.,) over time and alerts users on any changes to the existing source schemas using its dynamic alert mechanism. These features enable effective data discovery, simplified audit and governance, and better collaboration with flexible schema adoption.

HCLTech’s Intelligent Ingestion solution has capability to integrate with ServiceNow for automated incident creation and workflow management to handle data ingestion failures. HCLTech’s iONA (iAct) solution provides seamless integration with ServiceNow tool which is leveraged by Intelligent Ingestion solution for Incident Reporting and Management.

HCLTech is an AWS Premier Service Partner and Managed Service Provider (MSP) uniquely positioned to help enterprises as a Global System Integrator (GSI) and an ISV. HCLTech is supercharging progress for hundreds of leading global enterprises, vested in solving day-to-day or complex challenges with a dedicated full-stack business unit. HCLTech also holds AWS Generative AI, Migration, DevOps, SAP, Storage and Mainframe Modernization Competencies and is an MSP Partner.

Solution Overview

HCLTech’s Intelligent Ingestion provides a set of re-usable ETL processes that are leveraged to ingest data from any supported data sources into desired target S3 or Data Warehouse such as Amazon Redshift.

This solution uses one single job to trigger the entire ETL flow including data ingestion, data quality, curation and enrichment. It also encourages to use standard parametrized approach for handling all data types and sources, with provision to handle concurrent ETL job executions, making it a fully scalable solution to manage bulk ingestion seamlessly. It separates the transformation logic from the ETL process, making it re-usable, scalable and maintainable.

This solution comprises of the following key pillars, each having several interesting features that are crucial for building a robust and complete ingestion solution

  1. Schema Evolution

  • Significance: Schema versioning and control
  • Services used: AWS Glue Data Catalog
  • Rationale: Managing data schema changes without disrupting existing ETL pipelines
  • Key features: Fully managed; Ensures data integrity during schema changes; Enhances traceability between different schema versions; Notify the schema changes using the dynamic alert mechanism; Schema changes don’t impact existing ETL workflows.
  1. Fully Automated ETL Pipeline

  • Significance: Accelerates ETL efficiency and saves time. Consistent and reliable data ingestion, transformations & aggregations. Agile and cost effective.
  • Services used: AWS Glue, Amazon Kinesis
  • Rationale: No code; Improved time-to-insights; Centralized control and monitoring of ETL pipelines; Pay-as-you-go pricing
  • Key features: Fully managed; Highly scalable and elastic to handle varying data workloads; Reduces human errors improving overall efficiency; AWS Glue and Amazon Kinesis support wide range of data sources and data type formats; Improves operational efficiency for both batch and real-time ingestion.
  1. Re-Usable ETL Workflow & Transformation

  • Significance: Promotes ETL consistency. Easily pluggable ETL workflow, completely customizable for any ETL process. Cost effective and saves time. Promotes code re-usability and quick deployment
  • Services used: AWS Step Functions, Spark ETL (AWS Glue Spark and PySpark jobs)
  • Rationale: Code standardization; Promotes agile development with reusable codes
  • Key features: Fully managed; Promotes code re-usability saving considerable development efforts; Quick data source inclusion and effective application of transformations; Innovative design which can hold industry specific pre-built generic transformations targeting specific use-cases on data -ingestion, curation, enrichment, data quality etc.
  1. Centralized Data Catalog

  • Significance: Single source of truth for all source data; Data Lineage; Data Audit and Compliance
  • Services used: AWS Glue Data Catalog
  • Rationale: Data Discovery and Lineage; Manage metadata on ingested data sources
  • Key features: Fully managed; Greater control on the ingested data with complete lineage for audit and compliance; Easy to discover and collaborate using centralized data catalog
  1. Data Quality (DQ) and Governance

  • Significance: Enhanced data quality automation; Provision custom DQ rules; Support wide DQ check transformations for improved data quality and standardization.
  • Services used: AWS Glue DataBrew, AWS Lake Formation
  • Rationale: Improved data consistency and accuracy for actionable insights using high quality data rules; No-code/ low-code transformation; Enhanced Data Quality monitoring
  • Key features: Fully Managed; Improves data accuracy with extensive data quality monitoring; Provides rich data governance and integrates easily with broader AWS ecosystem; Automated DQ jobs ensures efficient Data Quality validations with reliable data processing.
  1. Dynamic Alert Mechanism, Incident Reporting & Management

  • Significance: Provisioning seamless integration with ServiceNow tool to automatically raise incidents for every high severity alerts; Dynamic alert mechanism provisioned in every stage of the ETL process for both success and failure.
  • Services used: Amazon SNS, Amazon Lambda, HCL’s iONA(iAct) solution, ServiceNow Tool
  • Rationale: HCLTech’s in-house iONA(iAct) solution offers seamless integration with ServiceNow tool which is leveraged by Intelligent Ingestion solution for Incident Reporting and Management and Amazon SNS for all the dynamic alerts
  • Key features: HCLTech’s Intelligent Ingestion solution is fully integrated with HCLTech’s iONA(iAct) solution to auto-create incident in ServiceNow tool for failure event happening at any ETL layer including Ingestion, DQ, curation, enrichment stages with appropriate email alerts to the authorized users; Amazon SNS has been used to trigger email alerts in case of any success/failure at any stages of the ETL process.

Solution Architecture and Process Flow Diagram

HCLTech’s Intelligent Ingestion solution leverages native AWS services to define and process end-to-end ETL workflow in a highly efficient and cost-effective manner.

The following diagram explains the architecture design of the Intelligent Ingestion solution on AWS:

HCLTech-IntelligentIngestion-1

Figure 1 – Intelligent Ingestion solution – Architecture Diagram

HCLTech’s Intelligent Ingestion solution will start the process by invoking crawler state machine which will internally execute the glue crawlers to connect and scan data from various sources such as RDBMS, edge devices, logs, batch data etc. It will simultaneously create metadata in the centralized AWS Glue Data Catalog. A single glue job will ingest data in parallel into Amazon S3 raw zone for different databases/tables. Upon ingestion completion, Data Quality validation will be performed by AWS Glue DataBrew jobs. Curation state machine will then execute AWS Glue curation job and will store the results into Amazon S3 curated zone. Post which, the final enrichment state machine will execute and perform the required aggregations and store the results into Amazon S3 enrichment zone. Failure at any stage of the ETL process will create high severity ServiceNow ticket with an Amazon SNS email alert notifying the users for corrective actions. Each state machines (SM) are lightly coupled enabling business to easily plug in or plug out any specific SM from the main SM. The AWS Glue Data Catalog and Amazon S3 are governed by AWS Lake formation and access privileges will be provided based on the required permissions.

HCLTech-IntelligentIngestion-2

Figure 2– Intelligent Ingestion solution- Process flow diagram

Conclusion

In this blog, we have discussed the challenges faced by organizations with existing ETL processes and how HCLTech’s Intelligent Ingestion solution revolutionizes the end-to-end data processing and management journey by

  • Enabling rapid ETL (no code/low code) with sustainable data strategies
  • Provisioning centralized data cataloging with effective schema evolution
  • Pre-built re-usable advanced data transformations for batch and real-time workloads
  • And with robust data quality and governance capabilities

In summary, this solution standardizes the entire ETL lifecycle, reduces manual efforts and ensures data integrity, consistency and reliability. Moreover, it is a fully scalable and cost-effective solution empowering organizations to make informed decisions, drive innovation and propel their growth in any evolving data-driven landscape.

For more information or to schedule a demo session, reach out to HCLTech at DNA_DATA_BI_FABRIC@hcl.com. You can also learn more about HCLTech solutions on AWS Marketplace.
.
HCLTech-APN-Blog-Connect-2023


HCLTech – AWS Partner Spotlight

HCLTech is an AWS Premier Consulting Partner uniquely positioned to help enterprises as a GSI and an ISV. HCLTech is supercharging progress for hundreds of leading global enterprises, vested in solving day-to-day or complex challenges with a dedicated full-stack business unit. HCLTech also holds Generative AI, Migration, DevOps, SAP, Storage and Mainframe Modernization Competencies and is an MSP Partner.

Contact HCLTech | Partner Overview | AWS Marketplace