AWS Machine Learning Blog

Enable conversational chatbots for telephony using Amazon Lex and the Amazon Chime SDK

Conversational AI can deliver powerful, automated, interactive experiences through voice and text. Amazon Lex is a service that combines automatic speech recognition and natural language understanding technologies, so you can build these sophisticated conversational experiences. A common application of conversational AI is found in contact centers: self-service virtual agents. We’re excited to announce that you can now use Amazon Chime SDK Public Switched Telephone Network (PSTN) audio to enable conversational self-service applications to reduce call resolution times and automate informational responses.

The Amazon Chime SDK is a set of real-time communications components that developers can use to add audio, messaging, video, and screen-sharing to your web and mobile applications. Amazon Chime SDK PSTN audio integration with Amazon Lex enables builders to develop conversational interfaces for calls to or from the public telephone network. You can now build AI-powered self-service applications such as conversational interactive voice response systems (IVRs), virtual agents, and other telephony applications that use Session Initiation Protocol (SIP) for voice communications.

In addition, we have launched several new features. Amazon Voice Focus for PSTN provides deep learning-based noise suppression to reduce unwanted noise on calls. You can also now use machine learning (ML)-driven text-to-speech in your application through our native integration to Amazon Polly. All features are now directly integrated with Amazon Chime SDK PSTN audio.

In this post, we teach you how to build a conversational IVR system for a fictitious travel service that accepts reservations over the phone using Amazon Lex.

Solution overview

Amazon Chime SDK PSTN audio makes it easy for developers to build customized telephony applications using the agility and operational simplicity of serverless AWS Lambda functions.

For this solution, we use the following components:

  • Amazon Chime SDK PSTN audio
  • AWS Lambda
  • Amazon Lex
  • Amazon Polly

Amazon Lex natively integrates with Amazon Polly to provide text-to-speech capabilities. In this post, we also enable Amazon Voice Focus to reduce background noise on phone calls. In a previous post, we showed how to integrate with Amazon Lex v1 using the API interface. That is no longer required. The heavy lifting of working with Amazon Lex and Amazon Polly is now replaced by a few simple function calls.

The following diagram illustrates the high-level design of the Amazon Chime SDK Amazon Lex chatbot system.

To help you learn to build using the Amazon Chime SDK PSTN audio service, we have published a repository of source code and documentation explaining how that source code works. The source code is in a workshop format, with each example program building upon the previous lesson. The final lesson is how to build a complete Amazon Lex-driven chatbot over the phone. That is the lesson we focus on in this post.

As part of this solution, you create the following resources:

  • SIP media application – A managed object that specifies a Lambda function to invoke.
  • SIP rule – A managed object that specifies a phone number to trigger on and which SIP media application managed object to use to invoke a Lambda function.
  • Phone number – An Amazon Chime SDK PSTN phone number provisioned for receiving phone calls.
  • Lambda function – A function written in Typescript that is integrated with the PSTN audio service. It receives invocations from the SIP media application and sends actions back that instruct the SIP media application to perform Amazon Polly and Amazon Lex tasks.

The demo code is deployed in two parts. The Amazon Lex chatbot example is one of a series of workshop examples that teach how to use Amazon Chime SDK PSTN audio. For this post, you complete the following high-level steps to deploy the chatbot:

  1. Configure the Amazon Lex chatbot.
  2. Clone the code from the GitHub repository.
  3. Deploy the common resources for the workshop (including a phone number).
  4. Deploy the Lambda function that connects Amazon Lex to the phone number.

We go through each step in detail.

Prerequisites

You must have the following prerequisites:

  • node V12+/npm installed
  • The AWS Command Line Interface (AWS CLI) installed
  • Node Version Manager (nvm) installed
  • The node modules typescript aws-sdk (using nvm) installed
  • AWS credentials configured for the account and Region that you use for this demo
  • Permissions to create Amazon Chime SIP media applications and phone numbers (make sure your service quota in us-east-1 or us-west-2 for phone numbers, voice connectors, SIP media applications, and SIP rules hasn’t been reached)
  • Deployment must be done in us-east-1 or us-west-2 to align with PSTN audio resources

For detailed installation instructions, including a script that can automate the installation and an AWS Cloud Development Kit (AWS CDK) project to easily create an Amazon Elastic Compute Cloud (Amazon EC2) development environment, see the workshop instructions.

Configure the Amazon Lex chatbot

You can build a complete conversational voice bot using Amazon Lex. In this example, you use the Amazon Lex console to build a bot. We skip the steps where you build the Lambda function for Amazon Lex. The focus here is how to connect Amazon Chime PSTN audio to Amazon Lex. For instructions on building custom Amazon Lex bots, refer to Amazon Lex: How It Works. In this example, we use the pre-built “book trip” example.

Create a bot

To create your chatbot, complete the following steps:

  1. Sign in to the Amazon Lex console in the same Region that you deployed the Amazon Chime SDK resources in.

This must be in either us-east-1 or us-west-2, depending on where you deployed the Amazon Chime SDK resources using AWS CDK.

  1. In the navigation pane, choose Bots.
  2. Choose Create bot.
  3. Select Start with an example.

  4. For Bot name, enter a name (for example, BookTrip).
  5. For Description, enter an optional description.
  6. Under IAM permissions, select Create a role with basic Amazon Lex permissions.
  7. Under Children’s Online Privacy Protection Act, select No.

This example doesn’t need that protection, but for your own bot creation you should select this option accordingly.

  1. Under Idle session timeout¸ set Session timeout to 1 minute.
  2. You can skip the Advanced settings section.
  3. Choose Next.

  1. For Select Language, choose your preferred language (for this post, we choose English (US)).
  2. For Voice interaction, choose the voice you want to use.
  3. You can enter a voice sample and choose Play to test the phrase and confirm the voice is to your liking.
  4. Leave other settings at their default.
  5. Choose Done.

  1. In the Fulfilment section, enter the following text for On successful fulfilment:
Thank you!  We'll see you on {CheckInDate}.
  1. Under Closing responses, enter the following text for Message:

Goodbye!

  1. Choose Save intent.
  2. Choose Build.

The build process takes a few moments to complete. When it’s finished, you can test the bot on the Amazon Lex console.

Create a version

You have now built the bot. Next, we create a version.

  1. Navigate to the Versions page of your bot (under the bot name in the navigation pane).
  2. Choose Create version.
  3. Accept all the default values and choose Create.

Your new version is now listed on the Versions page.

Create an alias

Next, we create an alias.

  1. In the navigation pane, choose Aliases.
  2. Choose Create alias.
  3. For Alias name, enter a name (for example, production).
  4. Under Associate with a version, choose Version 1 on the drop-down menu.

If you had more than one version of the bot, you could choose the appropriate version here.

  1. Choose Create.

The alias is now listed on the Aliases page.

  1. On the Aliases page, choose the alias you just created.
  2. Under Resource-based policy, choose Edit.
  3. Add the following policy, which allows the Amazon Chime SDK PSTN audio to invoke Amazon Lex for you:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SMALexAccess",
      "Effect": "Allow",
      "Principal": {
        "Service": "voiceconnector.chime.amazonaws.com"
      },
      "Action": "lex:StartConversation",
      "Resource": "<Resource-ARN-for-the-Alias>",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "<account-num>"
        },
        "ArnEquals": {
          "AWS:SourceArn": "arn:aws:voiceconnector:<region>:<account-num>:*"
        }
      }
    }
  ]
}

In the preceding code, provide the resource ARN (located directly above the text box), which is the ARN for the bot alias. Also provide your account number and specify the Region you’re deploying into (us-east-1 or us-west-2). That defines the ARN of the PSTN audio control plane in your account.

  1. Choose Save to store the policy.
  2. Choose Copy next to the resource ARN to use in a later step.

Congratulations! You have configured an Amazon Lex bot!

In a real chatbot application, you would almost certainly implement a Lambda function to process the intents. This demo program focuses on explaining how to connect to Amazon Chime SDK PSTN audio, so we don’t go into that level of detail. For more information, refer to Add the Lambda Function as a Code Hook.

Clone the GitHub repository

You can get the code for the entire workshop by cloning the repository:

git clone https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-workshop
cd amazon-chime-sdk-pstn-audio-workshop

Deploy the common resources for the workshop

This workshop uses the AWS CDK to automate the deployment of all needed resources (except the Amazon Lex bot, which you already did). To deploy, run the following code from your terminal:

cdk bootstrap
yarn deploy

The AWS CDK deploys the resources. We do the bootstrap step to make sure that AWS CDK is properly initialized in the Region you’re deploying into. Note that these examples use AWS CDK version 2.

The repository has a series of lessons that are designed to explain how to develop PSTN audio applications. We recommend reviewing these documents to understand the basics using the first few sample programs. You can then review the Lambda sample program folder. Lastly, follow the steps to configure and then deploy your code. In the terminal, enter the following command:

cd lambdas/call-lex-bot

Configure your Lambda function to use the Amazon Lex bot ARN

Open the src/index.ts source code file for the Lambda function and edit the variable botAlias near the top of the file (provide the ARN you copied earlier):

const botAlias = "<Resource-ARN-for-the-Alias>";

You can now deploy the bot with yarn deploy and swap the new Lambda function into PSTN audio with yarn swap. You can also note the welcome text in the startBotConversationAction object:

const startBotConversationAction = {
  Type: "StartBotConversation",
  Parameters: {
    BotAliasArn: "none",
    LocaleId: "en_US",
    Configuration: {
      SessionState: {
        DialogAction: {
          Type: "ElicitIntent"
        }
      },
      WelcomeMessages: [
        {
          ContentType: "PlainText",
          Content: "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to book a room, or, I'd like to rent a car."
        },
      ]
    }
  }
}

Amazon Lex starts the bot and uses Amazon Polly to read that text. This gives the caller a greeting, and tells them what they should do next.

How it works

The following example adds more actions to what we learned in the Call and Bridge Call lesson. The NEW_INBOUND_CALL event arrives and is processed the same way. We enable Amazon Voice Focus (which enhances the ability of Amazon Lex to understand words) and then immediately hand the incoming call off to the bot with a StartBotConversation action. An example of that action looks like the following object:

{
    "SchemaVersion": "1.0",
    "Actions": [
        {
            "Type": "Pause",
            "Parameters": {
                "DurationInMilliseconds": "1000"
            }
        },
        {
            "Type": "VoiceFocus",
            "Parameters": {
                "Enable": true,
                "CallId": "2947dfba-0748-46fc-abc5-a2c21c7569eb"
            }
        },
        {
            "Type": "StartBotConversation",
            "Parameters": {
                "BotAliasArn": "arn:aws:lex:us-east-1:<account-num>:bot-alias/RQXM74UXC7/ZYXLOINIJL",
                "LocaleId": "en_US",
                "Configuration": {
                    "SessionState": {
                        "DialogAction": {
                            "Type": "ElicitIntent"
                        }
                    },
                    "WelcomeMessages": [
                        {
                            "ContentType": "PlainText",
                            "Content": "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to order flowers."
                        }
                    ]
                }
            }
        }
    ]
}

When the bot returns an ACTION_SUCCESSFUL event, the data collected by the Amazon Lex bot is included in the event. The collected data from the bot is included, and your Lambda function can use that data if needed. However, a common practice for building Amazon Lex applications is to process in the data with the function associated with the Amazon Lex bot. Examples of the event and the returned action are provided in the workshop documentation for this session.

Sequence diagram

The following diagram shows the sequence of calls made between PSTN audio and the Lambda function:

For a more detailed explanation of the operation, refer to the workshop documentation.

Clean up

To clean up the resources used in this demo and avoid incurring further charges, complete the following steps:

  1. In the terminal, enter the following code:
yarn destroy
  1. Return to the workshop folder (cd ../../) and enter the following code:
yarn destroy

The AWS CloudFormation stack created by the AWS CDK is destroyed, removing all the allocated resources.

Conclusion

In this post, you learned how to build a conversational interactive voice response (IVR) system using Amazon Lex and Amazon Chime SDK PSTN audio. You can use these techniques to build your own system to reduce your own customer call resolution times and automate informational responses on your customers calls.

For more information, see the project GitHub repository and Using the Amazon Chime SDK PSTN Audio service.


About the Author

Greg Herlein has led software teams for over 25 years at large and small companies, including several startups. He is currently the Principal Evangelist for the Amazon Chime SDK service where he is passionate about how to help customers build advanced communications software.