AWS Machine Learning Blog

Category: Amazon Textract

Create a document lake using large-scale text extraction from documents with Amazon Textract

AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). However, they’re unable to gain insights such as using the information locked in the documents for large language models (LLMs) or search until they extract the text, forms, […]

Automate PDF pre-labeling for Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you […]

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

Conversational AI has come a long way in recent years thanks to the rapid developments in generative AI, especially the performance improvements of large language models (LLMs) introduced by training techniques such as instruction fine-tuning and reinforcement learning from human feedback. When prompted correctly, these models can carry coherent conversations without any task-specific training data. […]

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer expectation, and […]

Build well-architected IDP solutions with a custom lens – Part 2: Security

Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. By using the Framework, you will learn current operational and architectural recommendations for designing and operating […]

Build well-architected IDP solutions with a custom lens – Part 3: Reliability

The IDP Well-Architected Custom Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build a secure, efficient, and reliable IDP solution on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer […]

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

When a customer has a production-ready intelligent document processing (IDP) workload, we often receive requests for a Well-Architected review. To build an enterprise solution, developer resources, cost, time and user-experience have to be balanced to achieve the desired business outcome. The AWS Well-Architected Framework provides a systematic way for organizations to learn operational and architectural […]

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

Building a production-ready solution in the cloud involves a series of trade-off between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language […]

Build well-architected IDP solutions with a custom lens – Part 6: Sustainability

An intelligent document processing (IDP) project typically combines optical character recognition (OCR) and natural language processing (NLP) to automatically read and understand documents. Customers across all industries run IDP workloads on AWS to deliver business value by automating use cases such as KYC forms, tax documents, invoices, insurance claims, delivery reports, inventory reports, and more. […]

Layout visualization with Amazon Textract Textractor

Amazon Textract’s new Layout feature introduces efficiencies in general purpose and generative AI document processing tasks

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Layout is a new feature that allows customers to automatically extract layout elements such as paragraphs, titles, subtitles, headers, footers, and more from documents. Layout extends Amazon Textract’s word and line detection by automatically […]