AWS Public Sector Blog
Building your first generative AI conversational experience on AWS
Amazon Web Services (AWS) offers a variety of options for building chat-based assistants with generative artificial intelligence (AI) capabilities. The goal of this post is to present in simple words some of these options and what to keep in mind to decide which to use and how to get started.
To create a chat-based application, you need the following prerequisites.
- A user interface to input text and show results
- A large language model (LLM) that can interpret the prompt and deliver a response
- A way to access the model (whether hosting it or accessing it through an API)
- A knowledge base so the model can improve the response based on that specific data
In the next sections, this post will explain some terms and review AWS services you can use to build a generative AI conversational assistant.
Introduction to LLMs and RAG
Generative AI and large language models (LLMs) are transforming natural language processing by enabling more capable conversational AI and improved employee productivity. LLMs refer to very large deep learning models that are pre-trained on vast amounts of data. You can tailor responses to a specific use case by taking into account company data and filtering according to user permissions.
Since company data is external to the LLM (trained on different data sources), you need a way to bring it as context. This is where Retrieval-Augmented Generation (RAG) comes in. RAG is a technique to retrieve relevant company information to provide context for the AI model. The intelligent search capabilities of Amazon Kendra excel at RAG by providing accurate, relevant passages from company documents and data sources. Amazon Kendra offers pre-trained deep learning models across 14 domains, eliminating the need for machine learning (ML) expertise.
The Retrieve
API, designed for the RAG use case, can retrieve up to 100 semantically relevant passages of up to 200 token words each, ordered by relevance. Amazon Kendra comes with pre-built connectors to data sources like Amazon Simple Storage Service (Amazon S3), SharePoint, Confluence, and websites, and it supports common document formats such as HTML, Word, PowerPoint, PDF, Excel, and pure text files. To filter responses based on only those documents that the end user permissions allow, Amazon Kendra offers connectors with access control list (ACL) support and integrates with AWS identity services such as AWS Identity and Access Management (IAM) and AWS IAM Identity Center.
Amazon Kendra maximizes RAG accuracy by retrieving optimized context passages. It also enables filtering responses by user permissions. This makes it well-suited for building safe, useful enterprise conversational AI.
The following high-level steps show how a conversational assistant uses RAG with Amazon Kendra to provide answers to a user.
- The user makes a request to the generative AI app.
- The app issues a search query to the Amazon Kendra index based on the user request.
- The index returns search results with excerpts of relevant documents from the ingested enterprise data.
- The app sends the user request and the data retrieved from the index as context in the LLM prompt.
- The LLM returns a succinct response to the user request based on the retrieved data.
- The response is sent back to the user.
The following image shows the high-level architecture of RAG with Amazon Kendra.
The knowledge base is ready. But what about the user interface, the LLM, and a way to access the model?
Amazon SageMaker JumpStart and Amazon Kendra
The first option to host and access an LLM is Amazon SageMaker JumpStart, an ML hub with foundation models (FMs), built-in algorithms, and pre-built ML solutions you can deploy in a few steps. You can use it to evaluate, compare, and select FMs based on predefined metrics. Some pre-trained models are fully customizable for your use case with your data, and you can seamlessly deploy them into production with the user interface or AWS SDKs. In addition, you can access pre-built solutions to solve common use cases and share ML artifacts, including ML models and notebooks.
In the context of building a conversational interface, you can use proprietary and publicly available FMs from providers like AI21 Labs, Cohere, Databricks, Hugging Face, Meta, Mistral AI, Stability AI, and Alexa to perform a wide range of tasks such as article summarization and text, image, or video generation.
The following image shows the architecture for using Amazon SageMaker with Amazon Kendra.
SageMaker JumpStart is designed for individuals and organizations seeking comprehensive control over how and where to deploy generative AI models for training and inference. It enables the hosting of custom LLMs, offering flexibility to address diverse use cases. You still need to configure the platform, estimate compute usage, and build the app or user interface itself.
You can get started by following the AWS Machine Learning blog post Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart. Amazon SageMaker pricing is based on compute, and offers an On-Demand or Saving Plans model. For supported AWS Regions and quotas, refer to the Amazon SageMaker Developer Guide.
Amazon Bedrock and Amazon Kendra
The second option to accessing an LLM is Amazon Bedrock, which is the easiest way to make LLMs available for your use through a unified API. You can call high-performing FMs from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API.
You can use Amazon Bedrock to experiment and evaluate LLMs, privately customize them with your data using RAG, and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is fully managed and serverless, you don’t have to manage any infrastructure.
You can get started by experimenting with the workshop Create a Serverless Chatbot Using Amazon Bedrock, Amazon Kendra, and Your Own Data.
The following high-level steps show how a conversational assistant using Amazon Bedrock with Amazon Kendra and other AWS services provides answers to a user.
- A user asks questions to the chat-based assistant through an application that is hosted through AWS Amplify and Amazon CloudFront.
- The question is sent to the AWS Lambda RAG function through Amazon API Gateway.
- To retrieve the relevant context from the document source in an S3 bucket, the RAG function calls the Amazon Kendra API.
- The RAG function then sends the content returned from the API, along with a pre-built prompt, to the FM in Amazon Bedrock.
- The response from the FM is sent back to the user.
The following image shows the architecture for this process.
Amazon Bedrock pricing is based on model inference and customization. You can choose between two pricing plans for inference. With On-Demand and Batch mode, you can use FMs on a pay-as-you-go basis without making any time-based term commitments. With Provisioned Throughput mode, you provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment. For more information, see Supported AWS Regions and Amazon Bedrock endpoints and quotas.
Knowledge Bases for Amazon Bedrock
An alternative to Amazon Kendra for RAG is Knowledge Bases for Amazon Bedrock. This service features knowledge bases as a fully managed capability designed to implement the entire RAG workflow, from data ingestion to retrieval and prompt augmentation. Knowledge Bases for Amazon Bedrock simplifies the data ingestion process by allowing you to point to your data in Amazon S3 (or website crawler, SharePoint, Confluence, and more), and it handles the entire workflow of generating embeddings and storing them in a vector database. It has dedicated APIs such as Retrieve
and RetrieveAndGenerate
to simplify retrieving relevant results and augmenting the FM prompt.
At runtime, an embedding model is used to convert the user’s query to a vector. The vector index is then queried to find chunks that are semantically similar to the user’s query by comparing document vectors to the user query vector. In the final step, the user prompt is augmented with the additional context from the chunks retrieved from the vector index. The prompt and the additional context are then sent to the model to generate a response for the user. The following image shows this workflow.
To check Region availability for Knowledge Bases for Amazon Bedrock, visit Supported Regions and models for Knowledge Bases for Amazon Bedrock.
When building a chat-based assistant, you might want to safeguard the content and operate with responsible AI policies. For this purpose, an interesting offering is Guardrails for Amazon Bedrock, where you can implement policies such as denied topics, content filters, word filters, and sensitive information filters to protect the conversational experience by avoiding undesirable or harmful content and preventing the exposure of sensitive information.
Amazon Bedrock is the easiest way to access an LLM through an API. The service provides capabilities such as agents that execute tasks, fully managed knowledge bases to implement RAG, and guardrails that protect the content of a conversational experience. You still need to build the app or user interface itself with, for example, AWS Amplify and Amazon CloudFront.
QnABot on AWS
If you are looking for a multipurpose option for building your chat-based assistant, consider QnABot on AWS. It is a multichannel, multi-language conversational interface that responds to your customer’s questions, answers, and feedback. It allows you to deploy a fully functional chat-based assistant across channels, including chat, voice, SMS, and Amazon Alexa.
The main benefit of using QnABot is that it is part of the AWS Solutions Library, following AWS Well-Architected best practices, and is maintained and updated by AWS. It comes with a CloudFormation template that is ready to launch. The solution allows you to build, publish, and monitor assistants. It uses services such as Amazon Lex, Amazon Translate, Amazon Comprehend, and Amazon Polly to allow multi-language text and audio support.
The default option is to work with a Q&A bank of questions that go to Amazon OpenSearch Service. You can manually input those questions and answers in the admin panel of the content designer. However, it is possible (and recommended) to use Amazon Kendra for intelligent search (upload documents, scrap websites, and create a Q&A bank of questions there). Amazon Kendra works as a fallback when the question is not found in OpenSearch Service.
QnABot integrates with the Amazon Connect AI-powered contact center, and it can be embedded on a website following the steps in Deploy a Web UI for Your Chatbot. It has pre-built Kibana dashboards to get basic metrics on conversational assistant usage.
The latest version of QnABot comes with the option to use LLMs to disambiguate customer questions by taking conversational context into account. The default model is Falcon-40B-instruct on an Amazon SageMaker endpoint. Since the LLM landscape is constantly evolving, QnABot also integrates with Amazon Bedrock and any other LLM using AWS Lambda. To help you get started, the solution team released sample Lambda functions that allow this integration.
You can get started with the solution at the qnabot-on-aws GitHub repo. The cost of QnABot depends on the underlying services involved. The same applies for Supported AWS Regions and Quotas.
The following image shows the architecture for QnABot.
Amazon Q Business
Last, you can use a managed generative AI–powered assistant with Amazon Q Business that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems.
You can get started testing the service and creating your Amazon Q application following the steps in Build a contextual chatbot application using Amazon Q for Financial Services. In this AWS Community post, the author uses Amazon S3 to store the knowledge base. The service has more than 40 connectors to sources such as Amazon Relational Database Service (Amazon RDS), Confluence, Dropbox, GitHub, Gmail, Microsoft OneDrive, and SalesForce.
You can use the Amazon Q Business web experience to ask questions and accomplish tasks. Among the alternatives presented previously for building a generative AI conversational experience, this one requires the least effort because it manages all the underlying steps (including selecting an LLM and making it available). It allows you to focus on your business requirements and enriching your knowledge base.
As of this writing, Amazon Q Business is available in the US East (N. Virginia) and US West (Oregon) AWS Regions. The service quotas can be found in the AWS Documentation, and Amazon Q Business pricing is based on a subscription model per user per month.
In the context of AWS data, how would Amazon Q Business describe itself? “As an AI assistant created by Amazon Web Services (AWS), I am designed to help customers with a wide range of tasks related to building, optimizing, and operating applications and workloads on AWS.
- I am an AI-powered assistant trained on 17 years’ worth of high-quality AWS information, allowing me to provide knowledgeable and relevant assistance on various AWS services and features.
- I am available to users wherever they interact with AWS, such as the AWS Management Console, popular IDEs, and more, and I can easily integrate into existing workflows to help accelerate innovation.
- While I can provide general information about AWS services and capabilities, I do not have access to specific details about pricing, limits, availability, quotas, employees, or leadership. For the most up-to-date information, I recommend referring to the official AWS documentation.”
Conclusion
AWS offers a range of powerful solutions for building your own conversational assistant. Incorporating RAG allows the assistant to retrieve relevant information from a knowledge base, use it as context, and generate responses using an LLM.
These are some options to get started building your first generative AI conversational experience with AWS:
- If you want to use your own FMs (or those not offered on Amazon Bedrock) or want to control and customize the infrastructure, such as type of compute and GPU for training and inference endpoints, use Amazon SageMaker and Amazon Kendra for RAG with your own front end (UI).
- If you want to build your assistant using various LLMs accessed through APIs, with the ability to enhance the results with your proprietary datasets without doing the heavy lifting of managing the underlying infrastructure, use Amazon Bedrock and Amazon Kendra for RAG with your own front end (UI). This front end can be built by using AWS Amplify, Amazon CloudFront, and Amazon API Gateway.
- If you wish to take the previous alternative further and get a fully managed capability specifically designed to implement the entire RAG workflow, from data ingestion to retrieval and prompt augmentation, use Knowledge Bases for Amazon Bedrock for RAG with your own front end (UI).
- If you want a multi-language, multichannel Q&A solution that comes with a CloudFormation template that is ready to launch, use QnABot on AWS.
- If you want to build a generative AI–powered assistant and control the data that goes into the knowledge base, use Amazon Q Business.
This list is not exhaustive. For further information, you can check Q&A applications that use Amazon Aurora PostgreSQL-Compatible Edition or Amazon OpenSearch Service as vector databases for RAG.