Business Productivity
Using Amazon Voice Focus AMI to reduce noise in audio
Amazon Chime SDK team has launched Amazon Voice Focus AMI for customers to help reduce noise and improve the quality of their audio content. Amazon Voice Focus is an award-winning, deep-learning noise suppression algorithm used in Amazon Chime SDK meetings. It is now packaged as an Amazon Linux 2 (AL2) Machine Image (AMI). The Amazon Voice Focus AMI helps builders, content creators, and media producers to reduce background noise such as fans, lawnmowers, and barking dogs as well as foreground noise like typing and shuffling papers.
Amazon Voice Focus is available as a feature in the Amazon Chime SDK and helps customers reduce background and foreground noise from audio and video sessions created in Amazon Chime SDK meetings. Customers have shown interest in our audio algorithm in a variety of other use cases, such as: podcast and live streaming audio cleanup. Most of the non-meeting workloads do not necessarily use the open-source WebRTC and often involve proprietary media pipelines. The Amazon Voice Focus AMI with FFmpeg integration provides customers both the flexibility and control to integrate the algorithm into any media pipeline.
In this blog, we will walk you through two use cases to show how to integrate Amazon Voice Focus AMI with your existing audio processing workloads.
In the first example, we demonstrate a live streaming use case using RTMP, a popular media streaming protocols and widely accepted by various streaming platforms (e.g., YouTube, Facebook Live, Twitch), where Amazon Voice Focus AMI will receive the RTMP stream from the publisher, reduce noise in the audio, and then continue sending the RTMP stream to the destination streaming platform. In this example, we will be using Twitch as the destination.
In the second example, we demonstrate a file-based processing use case, where podcasters can clean up their audio recordings before sharing them with listeners. In this use case, audio from the user’s Amazon Simple Storage Service (Amazon S3) bucket will be processed by the Amazon Voice Focus AMI, and the cleaned-up audio can be stored in the same or another Amazon S3 bucket.
First example: Clean up live RTMP audio stream using Amazon Voice Focus AMI
Real-Time Messaging Protocol (RTMP) is a TCP-based protocol designed to maintain persistent, low-latency connections and widely used in streaming audio, video, and data over the internet. Most media servers and streaming platforms can receive it, such as: YouTube, Facebook Live, Twitch, LinkedIn, Periscope, etc. One of the common live streaming setups might involve setting up Open Broadcaster Software (OBS) on your local laptop to capture media (audio and video), and then streaming the media to a streaming platform of your choice via RTMP. Oftentimes, the media captured by the OBS contains unwanted noises, which reduces the audio quality of the live stream. In this example, we will show you how to set up the OBS on your local laptop and configure an Amazon Elastic Cloud Compute (Amazon EC2) instance running the Amazon Voice Focus AMI to receive the RTMP media stream, clean up the audio, and then continue sending the stream to the streaming platform endpoint.
To let Amazon Voice Focus AMI receive the RTMP stream, you need to config the your Amazon EC2 security group inbound rule to open TCP port 1935. Please follow this guide to configure security group https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html#add-rule-authorize-access
Add a rule to a security group for inbound traffic over IPv4 (console)
- Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
- In the navigation pane, choose Instances.
- Select your instance and, in bottom half of the screen, choose the Security tab. Security groups lists the security groups that are associated with the instance. Inbound rules displays a list of the inbound rules that are in effect for the instance.
- For the security group to which you’ll add the new rule, choose the security group ID link to open the security group.
- On the Inbound rules tab, choose Edit inbound rules.
- On the Edit inbound rules page, do the following:
- Choose Add rule.
- For Type, choose Custom TCP. For Port range, put 1935.
- For Source, choose My IP to automatically populate the field with the public IPv4 address of your local computer. Alternatively, for Source, choose Custom and enter the public IPv4 address of your computer or network in CIDR notation. If you want to allow any public IPv4 address to connect to your Amazon EC2 instance, you can choose Anywhere-IPv4 for Source. Please configure this based on your specific network settings, security requirements and use cases.
- Choose Save rules.
Install an RTMP server on the Amazon Voice Focus AMI to capture the live stream from an external source. NGINX is a free and open-source software which can serve as an RTMP server (https://github.com/arut/nginx-rtmp-module). Sample scripts are provided in the Amazon Voice Focus AMI to install NGINX and use it with FFmpeg.
Sample instructions to use FFmpeg with VoiceFocus filter for live-streaming
- Install NGINX and the RMTP Module on the Amazon EC2 instance hosting the Amazon Voice Focus AMI
- Run cd /home/ec2-user/examples/scripts/nginx && ./build_nginx.sh to install NGINX
- Start the NGINX server on the Amazon EC2 Instance: sudo /usr/local/nginx/sbin/nginx
- In order to stop the NGINX server on the Amazon EC2 instance: sudo /usr/local/nginx/sbin/nginx -s stop.
- Local machine should have hardware support for microphone/camera. One might use the free and open- source OBS software on their local machine to capture live audio/video and stream it to the NGINX server hosted on the Amazon EC2 instance. Here are the required OBS settings:
- Settings → Audio → General, select “Mono” for Channels
- Settings → Stream:
- Service: Custom
- Server: rtmp://<your_ec2_public_IPv4_address>/videotest
- Stream Key: <your_custom_stream_name>Settings → Stream:
- Click “OK” to save your settings
- Setup an RTMP endpoint (e.g., YouTube Live, Twitch) to receive the denoised live stream (e.g., Twitch San Francisco RTMP endpoint: rtmp://sfo.contribute.live- video.net/app/{Twitch_Stream_Key}). Refer to this this article for finding the RTMP endpoint of your streaming platforms.
- Start streaming from OBS Software on the local machine
- Invoke FFmpeg with VoiceFocus on the Amazon EC2 instance, which listens to the NGINX server for input, processes it with the VoiceFocus audio filter and then streams the output to the RTMP endpoint of our choice. Note that the <your_custom_stream_name> has to match with the one you chose in step 2 iii, and the <rtmp_endpoint_url> is the one you set up in step 3.
ffmpeg -y -i rtmp://localhost:1935/videotest/<your_custom_stream_name> \ -c:v copy -c:a aac -ac 1 -af “voicefocus” -f flv <rtmp_endpoint_url>
That’s it! When viewing the live stream broadcasted on Twitch, you should hear the audio with the noise being reduced.
Second example: Clean up audio recordings using Amazon Voice Focus AMI
In many cases, customers might already have a set of audio recordings that contain unwanted noises that need to be cleaned up to improve the speech intelligibility in those recordings. This includes but not limited to: media recordings made by journalists in the field; recordings made by podcasters using less-than-ideal recording gear and not in a studio; recordings captured by mobile devices on the go in a noisy environment, such as on the street, in a subway station or a supermarket, etc. It is often desired to remove unwanted noise in such recordings and retain the voices with a high quality. The Amazon Voice Focus AMI can help in these use cases.
You can use the example scripts located in ~/examples/scripts/samples to run the voicefocus_demo executable and process various input media formats. Please note that the audio output is always sampled at 48 kHz, the full audio bandwidth.
Reduce noise in audio files stored locally on the Amazon EC2 instance
If you have audio WAV files stored locally on the Amazon EC2 instance that uses the Amazon Voice Focus AMI, simply run the following script to invoke Amazon Voice Focus processing on the file:
./vf-wav-local.sh <path/to/input_wav> <path/to/output_wav>
Reduce noise in audio files stored in an Amazon S3 bucket
Before you can reduce noise in audio files stored in an Amazon S3 bucket, you need to set up an AWS Identity and Access Management (IAM) role or policy that grants the Amazon S3 bucket access and also configure the Virtual Private Cloud (VPC) and subnet settings for a proper public network access. Please refer to the Amazon Voice Focus AMI Developer Guide and AWS documentation for more detail on it.
Process an audio WAV file in an Amazon S3 bucket
To invoke Amazon Voice Focus processing on an audio WAV file stored in an Amazon S3 bucket, simply provide the Amazon S3 URI of the file and the desired Amazon S3 URI for the output audio WAV file to the script as follow:
./vf-wav-S3.sh <input_s3_uri> <output_s3_uri>
Process an mp4 file stored in an Amazon S3 bucket
Similarly, to invoke Amazon Voice Focus processing on the audio content of a video (mp4) file stored in an Amazon S3 bucket, run the following script:
./vf-mp4-s3.sh <input_s3_uri> <output_s3_uri>
Conclusion
In this blog post, we walked through two basic examples of using the Amazon Voice Focus AMI to clean up the audio: live streaming using RTMP and file-based processing. In addition to these two use cases, the Linux Amazon Voice Focus shared library and C header file are also available in the system paths of the Amazon Voice Focus AMI, which helps customers to conveniently build applications on top of it to reduce unwanted noise and “focus” on the human voices. If your media processing workloads are already using Amazon EC2 instances, it is just a matter of running the Amazon Voice Focus AMI on the instances to leverage the state-of-the-art, award-winning noise suppression technology, Amazon Voice Focus, to clean up your audio.
Additional resources
Getting Started With Amazon Voice Focus AMI
Amazon Chime SDK Features (see Speech Enhancement)