AWS Machine Learning Blog
Deploy Gluon models to AWS DeepLens using a simple Python API
April 2023 Update: Starting January 31, 2024, you will no longer be able to access AWS DeepLens through the AWS management console, manage DeepLens devices, or access any projects you have created. To learn more, refer to these frequently asked questions about AWS DeepLens end of life. |
Today we are excited to announce that you can deploy your custom models trained using Gluon to your AWS DeepLens. Gluon is an open source deep learning interface which allows developers of all skill levels to prototype, build, train, and deploy sophisticated machine learning models for the cloud, devices at the edge, and mobile apps.
With Gluon, you can build machine learning models using a simple Python API and a range of pre-built, optimized neural network components. This makes it easy to build neural networks using simple code without sacrificing training performance. Gluon makes building new computer vision models easy; just create your model in SageMaker, and with a single click deploy it to your DeepLens, where the model optimizer will automatically optimize it for the best performance on the device.
In this post, we will walk you through developing a deep neural network model in Amazon SageMaker to detect the direction of the head and deploy it to AWS DeepLens. When there is a person in front of us, we humans can immediately recognize the direction in which the person is looking. For example, the person might be facing straight toward you, or the person might be looking somewhere else. The direction is defined as the head pose. We are going to develop a convolutional neural network mode (CNN) to estimate the head pose using images of human heads. The different head poses are classified as follows: down right, right, up right, down, middle, up, down left, left, and up left. Detecting the head pose could be used to understand who is paying attention in a classroom setting, viewer behavior in advertising, and even in driver assistance systems.
Gluon, the imperative interface in Apache MXNet, offers four major advantages over the symbolic MXNet. First, Gluon offers a full set of plug-and-play neural network building blocks such as predefined layers, optimizers, and initializers. Second, it also allows us to bring the training algorithm and model closer together, which provides flexibility in the development process. Third, it enables developers to define their dynamic neural network models so that they can be built on the fly using Python’s native control flow. Finally, Gluon provides all the benefits without sacrificing the training speed that the underlying engine provides.
Data: Prima Project head-pose image database
First, let’s identify the head-pose dataset we are going to use for the project. For this blog post we will use the Prima Project head pose images. You will find the original raw data in the following link:
Dataset: http://www-prima.inrialpes.fr/perso/Gourier/Faces/HPDatabase.html
Citation:
N. Gourier, D. Hall, J. L. Crowley
Estimating Face Orientation from Robust Detection of Salient Facial Features
Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK
There are a total of 2,790 head pose images and their corresponding tilt and pan angle attributes in the dataset. The tilt is defined as the north-south vertical axis, and the pan is defined as the east-west horizontal axis.
The dataset is composed of head-pose data for fifteen different individuals. Thus, there are 186 images for each subject (2,790/15 = 186). In this dataset, the head pose is categorized into 9 and 13 discrete tilt and pan angles, respectively (Tilt angles: -90°, -60°, -30°, -15°, 0°, +15°, +30°, +60°, and +90° from head-down posture to head-up posture. Pan angles: -90°, -75°, -60°, -45°, -30°, -15°, 0°, +15°, +30°, +45°, +60°, +75°, and +90° from the observer’s right to the left). When a subject is looking straight into a camera, both tilt and pan angles are 0°. The original image dimensions are 384 x 288 pixels.
Preprocessing the image data
Next, we preprocess the image data that will be used to train a neural network. We have a Python script (python2 preprocessingDataset_py2.py) for the preprocessing.
Run the following command to prepare the input data to generate HeadPoseData_trn_test_x15_py2.pkl
(6.7 GB).
This command lets you generate input images with the dimensions of 84 x 84 pixels and their corresponding head-pose angles.
During the preprocessing, applying the same scaling factor in both height and width of an input image is crucial for the head-pose estimator. If the scaling factors in two axes are different, the head-pose angle is altered. We are mainly targeting two different aspect ratios (1:1 and 16:9). (Spoiler alert! The aspect ratio of full frame size in an AWS DeepLens device is 16:9. Thus, if you want to use the entire frame data from AWS DeepLens for inference, the aspect ratio of 16:9 will be your choice. We used the model trained with the aspect ratio of 1:1 for our final product. Our final product only takes a part of frame data in a square shape for inference.) Inside the preprocessing script, the original head images were cropped and resized into a target image size while applying the same scaling factor in two orthogonal axes (84 x 84 pixels and 96 x 54 pixels for aspect ratios of 1:1 and 16:9, respectively).
The following figure demonstrates the image preprocessing procedure in which the target aspect ratio is 16:9. First, a rectangular crop region with arbitrary length was applied to each image based on the following three criteria: (1) its aspect ratio is 16:9, (2) it must fully contain a face region inside, and (3) it must be contained within the image frame. The selected area is then resized into 96 x 54 pixels. This preprocessing procedure mimics the digital (as well as optical) zoom in a camera.
This preprocessing was repeated 15 times for the data augmentation.
Head-pose classification
Next, we prepare the label data from head-pose angles. The head pose was classified into nine categories (the combinations of three tilt and three pan classes). The head pose contained within ± 19.5° in tilt and pan angles is labeled as a center position (Head pose Class of 4, Tilt Class of 1, and Pan Class of 1). The rationale behind the selection of threshold angles is that sin(19.5°) is equal to 0.33. Therefore, these two angles split a semicircle (the distance between sin(-90°) and sin(90°)) into three equal arc lengths.
Train the ResNet-50 model using Gluon
Creating the nine labels reduced the head-pose problem into a simple image classification task (that is, using an image as input, estimating one head pose out of nine). The model is fine-tuned from a ResNet-50 that we obtained from the MXNet model zoo. There are five main parts in the sample notebook: (1) data loading, (2) additional data augmentation, (3) fine-tuning ResNet-50, (4) validation, and (5) inference. Parts (1), (2), and (3) are especially important for training the model.
Head-Pose Gluon Tutorial Notebook:
https://github.com/aws-samples/headpose-estimator-apache-mxnet/blob/master/HeadPose_ResNet50_Tutorial_Gluon.ipynb
ResNet-50 Model from model zoo
We first download a pre-trained ResNet-50 model. Here is how you load the pre-trained model on Gluon.
Obtain a pre-trained ResNet Model from model zoo
The ImageNet pre-trained model has 1,000 categorical outputs. However in our case, we only need nine. Thus, we need to modify the number of output classes to match our labels.
Modify the ResNet 50 model from model zoo
Another ResNet-50 model called “net” is prepared. The “net” has nine class-outputs. Then, features from “pretrained_net” are passed onto the “net”. Note that the “net” is a model network in the serialized format. We are going to fine-tune the model.
Train the model
In this section we show you two helper functions for the training.
Training helper functions
The first helper method is for the accuracy evaluation during the training, and the other is a training loop. For the sake using this with AWS DeepLens, save the checkpoint model artifacts in the symbolic format (that is, .json and .params). Because “net” is in the serial format, we use the .export method to save a serial model in the symbolic model artifact. In addition, we want a softmax output layer at the end of our network. Thus, we used the Symbol API method to add a softmax output and .save method to overwrite the .json file.
If you want to save the serialized model in the serial format, you can simply use the .save_param method. Here is an example of how you save the serialized model weights and pass the weights to another serialized model.
Save the model in the serial (Gluon) format
We now have all the tools necessary to train the model. Let’s start the fine-tuning.
Fine-tune the model
The hybridize method allows us to save the serialized model (“net”) in the symbolic format.
Hopefully, the rest of the notebook is self-explanatory. After you successfully run the training, you will have .json and multiple model weights (.params) from multiple checkpoints. You have to hand-pick one .param that gives you the best validation accuracy. We trained the model on a p2.8xl ec2 instance running the AWS Deep Learning AMI Ubuntu version, and we achieved validation accuracy of ~80% with this dataset.
Train the ResNet-50 model using Amazon SageMaker (Python SDK) with Gluon
So far, we walked through the basics of how to train a CNN using Gluon in Python. The next step is to reproduce the same model training experience on the Amazon SageMaker Python SDK. Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. To use our dataset and code, we’ll write a custom entry point Python script to run on Amazon SageMaker.
S3 bucket
Create an AmazonS3 bucket first if you don’t have one. In this example, we are going to name the S3 bucket “deeplens-sagemaker-0000” hosted in the N. Virginia (US East 1) AWS Region. (If you want to deploy your trained model artifacts straight into AWS DeepLens, the region must be N. Virginia (US East 1)).
Inside the bucket, we have a folder named “headpose.” Inside the “headpose” folder, we have 4 sub-folders named “artifacts,” “customMXNetcodes,” “datasets,” and “testIMs.”
You are going to host the head-pose dataset you created earlier (HeadPoseData_trn_test_x15_py2.pkl)
in the datasets folder.
That is it for the preparation.
Amazon SageMaker notebook
Now, you launch Amazon SageMaker. After you open Amazon SageMaker notebook, upload our sample notebook and entry point Python script (HeadPose_SageMaker_PySDK-Gluon.ipynb
and EntryPt-headpose-Gluon.py
, respectively).
After you place the notebook and entry point script, there are only three steps for you to run the training.
First, specify your S3 bucket name in the sample Amazon SageMaker notebook (HeadPose_SageMaker_PySDK-Gluon.ipynb
). In this part, you also specify other folders inside your S3 bucket such as the “headpose” folder as well as the “artifacts” and “customMXNetcodes” folders underneath it.
Second, specify the training instance and other parameters in the MXNet object. In this example, we use a ml.p2.xlarge instance for the training.
“train_max_run” represents the maximum training time that the training instance is running in units of seconds (432000 seconds = 5 days) in case that the training takes a long time. “train_volume_size” corresponds to the disk volume of the training instance in GB.
You also see that the MXNet object, headpose_estimator takes the name of entry point script (i.e. EntryPt-headpose-Gluon.py
) as well as folder locations such as “model_artifacts_location” and “custom_code_upload_location”.
We name this job “deeplens-sagemaker-headpose”. The base_job_name will be the prefix of output folders we are going to create. For the model development for AWS DeepLens, it is a good practice to include both “deeplens” and “sagemaker” in the name of the Amazon S3 bucket as well as the name of the job.
Finally, we run the training by calling the “.fit” method. The method “.fit” takes the location of input dataset in the Python dictionary form.
This is how we run the training using Amazon SageMaker Python SDK. You can monitor the progress of the training on either the Amazon SageMaker or the Amazon CloudWatch consoles.
Entry point Python script
All details for the head-pose model training are described in the entry point Python script (EntryPt-headpose-Gluon.py
). You may want to closely look at the similarity between EntryPt-headpose-Gluon.py
and HeadPose_ResNet50_Tutorial-Gluon.ipynb
that we just discussed earlier. They are basically the same except for some instructions on directories to output and save model artifacts (such as model_dir and output_data_dir).
You may also want to compare EntryPt-headpose-Gluon.py
and EntryPt-headpose.py,
which is the symbolic Apache MXNet version of the head-pose entry point script. The notable difference between two entry point Python scripts is that the Gluon script has two additional functions (save and model_fn).
Because we want to develop the model for AWS DeepLens, we need to save the model in the symbolic format. However, the “net” in the function “train” is in the serial Gluon format. The function “save” accepts the return from function “train”, saves the format in any format that you want and places the model artifact in model.tar.gz. Without the “save”, Amazon SageMaker automatically saves the “net” in its default format.
def model_fn(model_dir):
The function “model_fn” is served for the same reason—the discrepancy between the formats of trained network and saved model artifact for the inference hosting.
After you successfully run the training, you have model.tar.gz in output folder inside artifacts folder. Inside the model.tar.gz, you find a pair of model-symbol.json and model-0000.params that produced the best validation accuracy during the training.
If you have your AWS DeepLens account, you can find the model artifacts you just developed on AWS DeepLens console.
In the AWS DeepLens console, choose Models, select “Amazon SageMaker trained model” and scroll down the Job IDs. You can immediately deploy the model to your AWS DeepLens device.
Conclusion
In this blog post we developed a head-pose estimator CNN model using the Gluon interface in Apache MXNet using Amazon SageMaker and deployed on to the AWS DeepLens device. We also dove deep into the difference between symbolic and serial model formats and showed you how to handle them for your own application. The symbolic interface for the same application is also provided in the github repo (HeadPose_ResNet50_Tutorial.ipynb
, HeadPose_SageMaker_PySDK.ipynb
, and EntryPt-headpose.py
).
The Amazon SageMaker Python SDK allows you to bring your custom Apache MXNet or Gluon script and dataset and makes it easy to train, deploy, and test Deep Learning models.
About the Authors
Tatsuya Arai PhD is a biomedical engineer turned deep learning oriented data scientist at Amazon ML Solutions Lab. He believes that the power of AI isn’t exclusively for computer scientists or mathematicians.
Vikram Madan is a Senior Product Manager for AWS Deep Learning. He works on products that make deep learning engines easier to use with a specific focus on the open source Apache MXNet engine. In his spare time, he enjoys running long distances and watching documentaries.
Eddie Calleja is a Software Development Engineer for AWS Deep Learning. He is one of the developers of the DeepLens device. As a former physicist he spends his spare time thinking about applying AI techniques to modern day physics problems.Brad Kenstler is a Data Scientist on the AWS Deep Learning Team. As part of the AWS ML Solutions Lab, he helps customers adopt ML & AI within their own organization through educational workshops and custom modeling. Outside of work, Brad enjoys listening to heavy metal and bourbon tasting.
Sunil Mallya is a Senior Solutions Architect in the AWS Deep Learning team. He helps our customers build machine learning and deep learning solutions to advance their businesses. In his spare time, he enjoys cooking, sailing and building self driving RC autonomous cars.
Jyothi Nookula is a Senior Product Manager for AWS DeepLens. She loves to build products that delight her customers. In her spare time, she loves to paint and host charity fund raisers for her art exhibitions.