Assisting People at Haptik Using Amazon Polly

This is a guest blog post by Swapan Rajdev, Co-Founder & CTO, and Ranvijay Jamwal, Lead DevOps Engineer, at Haptik Inc.

Given the busy lives we all live, our to-do lists keep growing, and it gets harder and harder to keep track of all the things we need to accomplish daily. From remembering our meetings to making sure we buy our next flight ticket, from remembering to drink enough water to making sure we make it to the gym, our lists never end, and their maintenance gets exhausting.

Haptik is India’s first personal-assistant app. Users can use the app to plan travel, check in for flights, book taxis, and set reminders. And of all the different features, the most important and frequently used is the Reminders feature. People use Haptik to set wake-up calls, set up reminders to drink water, call people at different times, send greetings to others for different occasions, and much more. Through the reminders feature, users will receive notifications on the app along with a phone call at a requested time, relating the reminder message.

In this post, we will cover how we use machine learning and text-to-speech (TTS) to set reminders for users – to call them at the given time to remind them of their tasks. We will cover how Amazon Polly helped us make personalized calls to our users and helped us scale our reminders feature to millions of users.

Reminders at Haptik

To get anything done by the personal assistant, the user comes onto the Haptik app and sends the bot a message. Every message in our system goes through a message pipeline in which we try to detect the following:

The domain the user is talking about (reminders, travel, nearby, etc.)
The task (intent) the user wants to get done
The entities (different data required to complete the task of the user)

At the end of this pipeline, if the bot has all the information, it goes ahead and completes the task. Otherwise, it replies back with relevant questions to gather all the information.

Apart from this basic pipeline, we have a lot of other algorithms which use Deep Learning to learn from historical chats to be better able to complete user’s tasks without their intervention.

Why Call Users?

To remind users about their tasks we send them a notification on the app along with a phone call. Although Haptik uses commonly-used notification techniques to remind the users, we believe that calling people works more effectively due to a few reasons:

First, in today’s smartphone age, we are all going through a notification overload from every other app, which leads to missing out on some important notifications. A phone call from an unknown number or Haptik is more effective than a regular alarm which often gets the snooze treatment.

Second, we are able to provide a much better user experience by changing the content and voice of the call based on the type of reminder task. For example, for the morning wake-up calls we send we use a soft and calm voice. Occasionally, we add a motivational quote towards the end of the call to make sure the user wakes up pleasantly and is charged up. Implementing such TTS use-cases is simple, easy, and reliable with Amazon Polly.

How it works

Reminders at Haptik is one of the most complex domains where many different technologies come together to make sure we can make calls to our users in a timely and personalized manner. To successfully set a reminder for the user we capture the following data points from the user:

The reason for setting the reminder (wake-up call, meeting reminder, etc.)
The date of reminder
The time of reminder
If it’s a recurring reminder, what should be the frequency

All of this information is used to derive metadata and is passed to a scheduler whose job is to call the user. The following code snippet shows how a reminder is created:

def create_reminder(user, reminder_task, date, time, repeat_pattern=None):
is_valid = check_reminder_date_time_validity(date, time, repeat_pattern)
	if not is_valid:
		Return False
	notification_content = get_notification_content_for_reminder(user, reminder_Task)
	call_script = get_call_script_for_reminder(user, reminder_task)
	audio_url = generate_audio_using_polly(user, call_script)
 	return schedule_job(user, audio_url, notification_content

Using Amazon Polly for TTS

Before we schedule a reminder, we first fetch the script that we want Amazon Polly to synthesize. For this we have a function that fetches the call script based on the type of reminder and the user.

Def get_call_script_for_reminder(user, reminder_task):
	all_call_scripts = CallScriptStore.objects.values(‘script’).filter(task_name=reminder_task.name)
	Call_script = random.choice(all_call_sciprts)
	return call_script.format(**{user_name: user.name})

Example output:

Rise and shine, Swapan! It's a beautiful day - Time to wake up!

After we have the script, we call Amazon Polly to convert the text to speech and upload the audio file to Amazon S3, which we can use later to play during the call. Use the following code to create the audio file (mp3) and upload it to Amazon S3:

from boto3 import Session

Def generatea_audio_using_polly(user, call_Script):
session = Session()
polly = session.client("polly", region_name=POLLY_REGION)
response = polly.synthesize_speech(Text=content,
				        OutputFormat="mp3",
				         VoiceId=get_polly_voice_for_task(reminder_ask))

with closing(response["AudioStream"]) as stream:
with open(mp3_file_path, "wb") as file:
	file.write(stream.read())

# Upload to S3
s3 = session.client("s3", region_name=AUDIO_BUCKET_REGION)
s3.upload_file(
 mp3_file_path,
  AUDIO_BUCKET_NAME, “file.mp3"
)

url = "https://s3-{0}.amazonaws.com/{1}/{2}/{3}".format(
 AUDIO_BUCKET_REGION,
AUDIO_BUCKET_NAME,
	“file.mp3”)

Return url

At the time of the actual reminder, our scheduler system makes an API call to our calling partner using the mobile number and the URL of the call script. A call is then made to the user during which the call script is played; this completes the reminder. We have received a lot of positive feedback on the content and behavior of the call. On any given day we send out more than 100,000 reminders.

Adding personality using Amazon Polly

Using Amazon Polly, you can generate call scripts in 51 different voices across 25 languages. This helps you provide a breadth of user experiences. In the previous function while generating the call script, we call a function `get_polly_voice_for_task` to generate a voiceId. You can get the different voices supported by Amazon Polly by using the following code:

session = Session()
polly = session.client("polly", region_name=POLLY_REGION)
response = polly.describe_voices()
voice_ids = [item['Id'] for item in response['Voices']]

Since most of our audience is in India, we use “Raveena” (English female Indian voice) frequently because this voice resonates with many of our users.

Sample of a wake-up call reminder

Listen now

Voiced by Amazon Polly

Sample of a birthday greeting

Listen now

Voiced by Amazon Polly

Why Amazon Polly

We have experimented with a number of different services for TTS but Amazon Polly was the frontrunner by miles. Some of the reason why we choose Amazon Polly are:

Speed of development and iteration – The Amazon Polly API is simple and very robust. It took us less than a day to implement the Amazon Polly API calls, and we designed our system in a way where almost all the configurations can be changed on the fly without the need of any code changes. We have a tool to change the call scripts and a tool to change the different voices. This allowed us to experiment and perform A/B testing with a lot of different scripts and voices before we could be satisfied with the experience.

Scalability – Based on the architecture described previously, we create the call script in advance and store it on Amazon S3 so that when we have to make the call the audio is already ready. This helped us to scale and trigger thousands of calls at the same time without hindering the user experience.

Reliability and Monitoring – Amazon provides a lot of great tools to monitor our Amazon Polly requests. We have experienced near 100% reliability and availability till now. We have never faced a downtime with Amazon Polly so far. To be on the safe side, we have created alarms to go off whenever we have more than 5 failed requests in a period of 5 minutes. You can easily set up alarms using Amazon CloudWatch, which we have then synced with PagerDuty.

Latency – Amazon CloudWatch provides some great ways to monitor different metrics of Amazon Polly. If you look at the following graphs, the average time an Amazon Polly audio file gets created in is 17ms for an average for 85 characters per file. This is really fast, and helps us deliver a very good user experience on thousands of concurrent calls.

Conclusion

On an everyday basis, we at Haptik try and make the life of our users easier by providing simple way to get things done. In the future, we plan to add support for multiple languages and the advantage of using Amazon Polly is that it already supports 24 different languages. Apart from that we are always tweaking and playing around with our Machine Learning algorithms to be able to understand more from the users. Along with that we are finding different ways to use technology to help serve our users better. We hope you found this post useful.

Additional Reading

Be sure to read the Haptik Case Study, “Haptik Supports 30% Monthly Increase in App Downloads Using AWS.”

About the Authors

Swapan Rajdev is the Co-Founder & CTO and Ranvijay Jamwal is the Lead DevOps Engineer at Haptik Inc. In their own words, “Haptik is a company that specializes in chatbots, with the flagship product being the Android and iOS personal assistant app with the same name. Other than the consumer app, we also work with enterprises to help build chatbots solutions for customer service, lead generation, marketing and much more.”