Reducing transcription costs by 60% using AWS AI/ML services

AWS branded background design with text overlay that says "Reducing transcription costs by 60% using AWS AI/ML services"

In the world of law enforcement and justice, every piece of evidence has a significant impact. From recorded testimonies to jail cell conversations, audiovisual files can play an important role in accurately documenting evidence that can make or break a case. Delivering these files in an easily accessible manner enables district attorneys to thoroughly evaluate and analyze evidence for presentation in court.

Challenges

The process of transcribing video or audio files has traditionally been manual and time-consuming, leading to error-prone and costly services that can negatively affect the integrity of legal proceedings. Beyond the need for accurate and cost-effective transcriptions, attorneys have determined a need for timestamping capabilities, speaker identification, search and replace capabilities, the highlighting of specific words, editing capabilities, and most importantly, shortened turnaround times.

Solution overview

To address the need for quicker turnaround and more accurate transcription of audiovisual files, the Contra Costa County (CCC) District Attorney’s (DA) Office reached out to Amazon Web Services (AWS) and partnered with AWS Partner ScaleCapacity to develop a solution that would automate the manual transcription process. The solution needed to be able to identify words with low confidence scores and specific words of interest on a case-by-case basis.

The CCC DA’s Office determined the best approach was to build a solution that is cloud-based and serverless and provided a secure, efficient, and scalable system with a user-friendly interface. The three parties collaborated to identify application and functionality requirements and developed the application according to CCC requirements. The application is integrated with Microsoft Azure AD for single sign-on authentication.

Figure 1 provides a high-level view of the application flow.

Figure 1. High-level workflow of the solution described in this post.

The solution includes the following AWS services:

Amazon CloudFront – A content delivery network (CDN) service built for high performance, security, and developer convenience
Amazon Cognito – A service that helps you implement customer identity and access management into your web and mobile applications
Amazon EventBridge – A serverless event bus that helps you receive, filter, transform, route, and deliver events
Amazon CloudWatch – A service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health
AWS Transcribe – A fully managed, automatic speech recognition (ASR) service
AWS Lambda – A serverless computing service
Amazon API Gateway – A fully managed service to create, publish, and manage APIs at scale
Amazon Simple Storage Service (Amazon S3) – For highly scalable and durable object storage
Amazon DynamoDB, a fully managed NoSQL database service

Access to the application is provided using single sign-on (integrated with Azure AD and Amazon Cognito). CloudFront distribution is used to serve the application, and Cognito is an identity provider for authentication and authorization.

Figure 2 provides a high-level view of the application infrastructure architecture.

Figure 2. High-level architecture diagram of the solution described in this post.

Solution walkthrough: Reducing transcription costs using AWS AI/ML services

The solution proceeds following these steps:

Attorneys and case analysts upload audio and video evidence collected using a variety of methods (body-worn cameras, recorded interviews, jail calls) through the user interface. The user identifies specific words of interest to be highlighted in the document and initiates the transcription process.
Within a few seconds, a transcription document is generated, which not only identifies words of interest specified by the user but also identifies transcribed words with low confidence scores for further confirmation.
Once transcribed, the user has access to an interface to view and edit transcribed data along with a media player to play audiovisual content.
The transcription includes speaker ID assignments, leaving speaker names to be populated by the user.
The corresponding transcribed data is highlighted as the media file plays.
Timestamps on the transcribed content correlate to timestamps on the uploaded media file, allowing side-by-side comparison and the ability to edit the transcribed document.
Search and search and replace features allow for faster editing time.
Additional features include keystrokes to play, pause, rewind, and fast forward.
The final document can be downloaded as a Word document on the user’s desktop.

Next steps and enhancements

Current enhancement plans include transcription of non-English audiovisual files with the capability of translating the transcribed content to English.

The developed solution supports the most common audiovisual formats, allows for batch uploads of 10 raw audiovisual files, and supports transcription of audiovisual files in multiple languages. Future enhancements will provide the option to translate documents from another language into English.

Customer response

The Contra Costa County DA’s Office is thrilled with the application. The faster, user-friendly interface was embraced by personnel, and an immediate cost savings of about 60 percent was realized. The settings were easily configured to meet compliance requirements, including a customized retention period for uploaded and transcribed files. The transcription application improved CCC’s transcription turnaround time from days to minutes with about 80 percent accuracy prior to editing, maintaining the reliability of the evidence presented in court with lower costs and greater speed.

Conclusion

By working with AWS and AWS Partner ScaleCapacity, the Contra Costa County District Attorney’s Office achieved its goal of increasing turnaround times and improving the accuracy of transcription of audiovisual files with lower costs. To learn more about how to use AWS services and partners to improve speed and accuracy in transcription, contact your AWS account representative.

AWS Public Sector Blog