AWS Architecture Blog
Field Notes: Speed Up Redaction of Connected Car Data by Multiprocessing Video Footage with Amazon Rekognition
In the blog, Redacting Personal Data from Connected Cars Using Amazon Rekognition, we demonstrated how you can redact personal data such as human faces using Amazon Rekognition. Traversing the video, frame by frame, and identifying personal information in each frame takes time. This solution is great for small video clips, where you do not need a near real-time response. However, in some use cases like object detection, real time traffic monitoring, you may need to process this information in near real-time and keep up with the input video stream.
In this blog post, we introduce how to leverage “multiprocessing” to speed up the redaction process and provide a response in near real time. We also compare the process run times using a variety of Amazon SageMaker instances to give users various options to process video using Amazon Rekognition.
For example, the ml.c5.4xlarge instance has 16 vCPUs, so we could theoretically have 16 processes, working in parallel, to process the video stream, which will significantly reduce the processing time. Our test against the sample video shows that we reduce the process run time by a factor of 11x, using the ml.c5.4xlarge instance.
Architecture Overview
Walkthrough: 6 Steps
1. We will assume that the video data from the car was ingested and is stored in a “Raw” Amazon S3 bucket. (For real time analytics, video data will likely be ingested from the connected vehicles into an Amazon Kinesis Video Stream)
2. In this architecture we will use an Amazon SageMaker notebook instance, which is a machine learning (ML) compute instance running the Jupyter Notebook App.
3. Additionally an AWS Identity and Access Management (IAM) role created with appropriate permissions is leveraged to provide temporary security credentials required for this program.
4. The individual frames are analyzed by calling the “DetectFaces” Amazon Rekognition API, which analyzes and provides metadata about the frame. If a face is detected in the frame, then Amazon Rekognition returns a bounding box per face.
5. We write a function multi_process_video to blur the detected face for each frame and distribute the processing job equally among all available CPUs in the SageMaker instance
6. We run the multi_process function for the input video and write the output video to S3 bucket for further analysis.
Detailed Steps
For the 5 steps mentioned previously, we provide the input video, code samples and the corresponding output video.
Step 1: Login to the AWS console with your user credentials.
- Upload the sample video to your S3 bucket.
Name it face1.mp4. I’ve included the following example of the video input.
Step 2: In this block, we will create a SageMaker notebook.
Notebook instance:
- Notebook instance name: VideoRedaction
Notebook instance class: choose “ml.t3.large” from drop down
Elastic inference: None
Permissions:
- IAM role: Select Create a new role from the drop-down menu. This will open a new screen, click next and the new role will be created. The role name will start with AmazonSageMaker-ExecutionRole-xxxxxxxx.
- Root access: Select Enable
- Assume defaults for the rest, and select the orange “Create notebook instance” button at the bottom.
This will take you to the next screen, which shows that your notebook instance is being created. It will take a few minutes and you can monitor the status, which will show a green “InService” state, when the notebook is ready.
Step 3: Next, we need to provide additional permissions to the new role that you created in Step 2.
- Select the VideoRedaction notebook.
This will open a new screen. Scroll down to the 3 block – “Permissions and encryption” and click on the IAM role ARN link.
This will open a screen where you can attach additional policies. It will already be populated with “AmazonSageMakerFullAccess”
- Select the blue Attach policies button.
- This will open a new screen, which will allow you to add permissions to your execution role.
- Under “Filter policies” search for S3full. AmazonS3FullAccess. Check the box next to it.
- Under “Filter policies” search for Rekognition. Check the box next to AmazonRekognitionFullAccess and AmazonRekognitionServiceRole.
- Click blue Attach Policies button at the bottom. This will populate a screen which will show you the five policies attached as follows:
- Click on the Add inline policy link on the right and then click on the JSON tab on the next screen. Paste the following policy replacing the <account number> with your AWS account number:
On the next screen enter VideoInlinePolicy for the name and select the blue Create Policy button at the bottom.
Step 3a: Navigate to SageMaker in the console:
- Select “Notebook instances” in the menu on left. This will show your VideoRedaction notebook.
- Select Open Jupyter blue link under Actions. This will open a new tab titled, Jupyter.
Step 3b: In the upper right corner, click on drop down arrow next to “New” and choose conda_tensorflow_p36 as the kernel for your notebook.
Your screen will look at follows:
Install ffmpeg
First, we need to install ffmpeg for multiprocessing video. It’s a free and open-source software project consisting of a large suite of libraries and programs for handling video, audio, and other multimedia files and streams. We use it to concatenate all the subset videos processed by each vCPU and generate the final output.
Install ffmpeg using the following command:
!conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge --yes
Import libraries – We import additional libraries to help with multi-processing capability.
Step 4: Identify personal data (faces) in the individual frames
Amazon Rekognition “Detect_Faces” detects the 100 largest faces in the image. For each face detected, the operation returns face details. These details include a bounding box of the face, a confidence value (that the bounding box contains a face), and a fixed set of attributes such as facial landmarks (for example, coordinates of eye and mouth), presence of beard, sunglasses, and so on.
You pass the input image either as base64-encoded image bytes or as a reference to an image in an Amazon S3 bucket. In this code, we pass the image as jpg to Amazon Rekognition since we want to see each frame of this video. We also show how you can expand the bounding boxes returned by Amazon Rekognition, if required, to blur an enlarged portion of the face.
Step 5: Redact the face bounding box and distribute the processing among all CPUs
By passing the group_number of the multi_process_video function, you can distribute the video processing job among all available CPUs of the instance equally and therefore largely reduce the process time.
Step 6: Run multi-processing video function and write the redacted video to the output bucket
- Then we multi-process the video and generate the output using multiprocessing function and ffmpeg in python.
- We take a record of each video processed by a CPU in the format of ‘1.mp4’, ‘2.mp4’ … in a file called multiproc_files and then use subprocess to call ffmpeg to concatenate these videos based on these videos’ order in multiproc_files.
- After the final video is generated, we remove all the intermediate results and upload the face-blurred result to a S3 bucket.
Output:
Group 13 finished processing!
Group 15 finished processing!
Group 14 finished processing!
Group 12 finished processing!
Group 11 finished processing!
Group 9 finished processing!
Group 10 finished processing!
Group 1 finished processing!
Group 3 finished processing!
Group 4 finished processing!
Group 8 finished processing!
Group 5 finished processing!
Group 2 finished processing!
Group 7 finished processing!
Group 6 finished processing!
Group 0 finished processing!
Total Process Time: 15.709482431411743 s
Using the same instance, we reduce the process time from 168s to 15.7s. As we mentioned, ml.c5.4xlarge has 16 vCPUs and you can even further reduce the process time if you have an instance that has 32 or 64 CPUs.
Note: Choosing the right instance will depend on your requirement for process time and cost. As this result demonstrates, multiprocessing video using Amazon Rekognition is an efficient way to leverage the benefits of Amazon Rekognition state-of-the-art ML model and powerful multi-core Amazon SageMaker instances.
Comparison of Amazon SageMaker Instances in Terms of Process Time and Cost
Here is the comparison table generated when processing a 6.5 seconds video with multiple faces on different SageMaker instances. Following is a video screenshot:
Based on the following table, you learn that instances with 16 vCPU (4xlarge) are better options in terms of faster processing capability, while optimized for cost.
Depending on the size of your input video file and the requirements for real-time processing, you can break the input video file into smaller chunks and then scale instances to process those chunks in parallel. While this example is focused on blurring faces, you can also use AWS Rekognition for other use cases like someone wielding a gun, smoking a cigarette, suggestive content and the like. These and many other moderation activities are all supported by Rekognition content moderation APIs.
Conclusion
In this blog post, we showed how you can leverage multiple cores in large machine learning instances, along with Amazon Rekognition. Doing this can significantly speed up the process of redacting personally identifiable information from videos collected by connected vehicles. The ability to provide near-real-time information unlocks additional value from the video that is ingested. For example, in smart cities, information is collected about the environment, such as road traffic and weather. This data can be visualized in near-real-time to help city management make decisions that can optimize traffic and improve residents’ quality of life.