Build, train, and deploy Amazon Fraud Detector models using the open source Python SDK

Companies providing digital services are looking for ways to effectively identify fraudulent activities, such as online payment fraud and fake account creation. Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and builds on 20 years of fraud detection expertise from Amazon Web Services (AWS) and Amazon.com to automatically identify potentially fraudulent activity. In order to enable machine learning operations (MLOps) and orchestrate the build, train and deploy process we present an open-source AWS SDK for Python. This library is meant to help with rapid prototyping on a programmatic level. But it also supports programmatic deployment, model re-training, and batch prediction.

The latter is especially of interest when dealing with tabular data that is used for inference. You can use the SDK for Python in any runtime environments with Python 3. Since it is a Python library downloadable from PyPi, you can also use it in a Docker container when orchestrating your MLOps pipeline. In this blog post, we will show you a step-by-step guide for using the Amazon Fraud Detector open-source Python SDK for Python.

Prerequisites

Set IAM permissions

In our example, we are using Amazon SageMaker to work with the open-source Python SDK. However, the below code will work on any machine and computer. Make sure that the user you are signed in with has the access rights to Amazon Fraud Detector, as shown below for SageMaker. To use the Amazon Fraud Detector Python SDK for Python from a SageMaker notebook, you first need to grant the SageMaker notebook the permissions for calling Amazon Fraud Detector APIs.

We assume that you have already created an Amazon SageMaker notebook instance. The instance is automatically associated with an AWS Identity and Access Management (IAM) execution role. In order to find the role that is attached to your instance, you can click on the instance name in the SageMaker console. Scroll down on the next screen to Permissions and encryption. You can identify the role as the hyperlink that brings you to the IAM console:

Screenshot of the Amazon SageMaker console with the IAM role settings highlighted

Attach the Amazon Fraud Detector service role with full access to the service. Once you click on the above role and open a new tab, click on Attach policies on the left side of the screen. Once you see all policies listed, select AmazonFraudDetectorFullAccessPolicy and click Attach policy on the bottom right.
Screenshot showing how to Attach the policy.

You are now ready to use the SDK for Python on your Amazon SageMaker notebook instance.

Getting started with the SDK

The open-source SDK for Python is built to help you on your Amazon Fraud Detector journey and introduces some functionalities that support working with the service. For example, it can create and push a manifest file or check your image for compliance with the service limits.

Before we continue with the setup, it’s important to understand the service from a high level. The service expects tabular data. This data can vary from use case to use case, but it can have names, email addresses, IP addresses, and other fraud related columns saved, as seen in this example Jupyter notebook.

If you are developing on your local computer or any instance other than the SageMaker environment we are using you can copy the below code, or review this example notebook:

# Training
INPUT_BUCKET = "YOUR_S3_BUCKET_FOR_TRAINING"
DETECTOR_NAME = "YOUR_DETECTOR_NAME"
MODEL_NAME = "YOUR_MODEL_NAME"
ENTITY_TYPE = "YOUR_ENTITY_TYPE" # e.g. "transaction"
EVENT_TYPE = "YOUR_EVENT_TYPE" # e.g. "credit-card-transaction"
MODEL_TYPE = "ONLINE_FRAUD_INSIGHTS"
MODEL_VERSION = "1" # leave this as one if you start right at the beginning
DETECTOR_VERSION = "1" # leave this as one if you start right at the beginning
REGION = "THE_REGION"

We will use these variables as:

INPUT_BUCKET: the Amazon Simple Storage Service (Amazon S3) bucket that contains your tabular data to train a model
DETECTOR_NAME: the unique name of the Amazon Fraud Detector detector
MODEL_NAME: the unique name of the Amazon Fraud Detector model
ENTITY_TYPE: the unique name of the Amazon Fraud Detector entity stored
EVENT_TYPE: the unique name of the Amazon Fraud Detector event type
MODEL_TYPE: the model type
MODEL_VERSION: the model version you want to deploy (note: when starting fresh, “1” is the default)
DETECTOR_VERSION: the detector version you want to deploy (note: when starting fresh, “1” is the default)

Lastly, you need to install the SDK for Python. You can do this via pip install. Simply use:

# Install the SDK using pip
 !pip install frauddetector

in your Jupyter notebook or delete the exclamation point to use in your terminal. You are now all set to get started building your model.

Build an Amazon Fraud Detector model

In this section, we will walk you through the process of building a model. Before we start using the SDK for Python, stage the training data to build the model from an Amazon Simple Storage Service (Amazon S3) bucket. This example uses sample data shown below:
Table showing the sample data

Stage the data

Stage the training data to build the model from an Amazon S3 bucket. Unzip the sample data, and copy the file registration_data_20K_minimum.csv into an Amazon S3 bucket, which the SDK environment can access to load the data. This location should be stored in the INPUT_BUCKET variable configured earlier.

Imports

Import the Fraud Detector SDK and the data profiler:

from frauddetector import frauddetector, profiler

Profile the data

The Amazon Fraud Detector SDK for Python can automatically profile the data to derive the correct input format for initializing an Amazon Fraud Detector instance. The following data structures are returned by the Amazon Fraud Detector profiler get_frauddetector_inputs() utility:

Amazon Fraud Detector Labels: these are the values of the labels used to label a row-event as a FRAUD or LEGIT (non-fraud) event in the EVENT_LABEL field.
Amazon Fraud Detector Variables: This is a list of definitions for the modelVariables defined in the data schema, providing the Amazon Fraud Detector variableType and datatype as described in the Amazon Fraud Detector documentation.
Amazon Fraud Detector Data Schema: this is a JSON structure that defines the field-names of the input data and maps values in the EVENT_LABEL field to FRAUD or LEGIT classification.

The data profiler generates this output based on a Pandas data-frame that is passed into it:

# imports for loading Pandas data-frame
import pandas as pd
import boto3, io
# instantiate a FraudDetector profiler
profiler = profiler.Profiler()
df = pd.read_csv(
    "training_data/registration_data_20K_minimum.csv")
data_schema, variables, labels = profiler.get_frauddetector_inputs(data=df)

The output should look similar to:
Output from loading Pandas data frame

Train a model

First, instantiate the fraud detector SDK instance, specifying the following attributes:

entity_type – an entity that represents who is performing the event, such as a new user
event_type – a business activity that is evaluated for fraud risk, such as a user registration
detector_name – the name of this Amazon Fraud Detector project
model_name – name of the model to create
model_version – for tracking new versions of the model
model_type – valid values: “ONLINE_FRAUD_INSIGHTS” or “TRANSACTION_FRAUD_INSIGHTS” (learn more on how to choose a model type)
region – the AWS region where the Amazon Fraud Detector should be deployed
detector_version – version for tracking versions of this Amazon Fraud Detector project

To set up your FraudDetector object, you can follow this example:

detector = frauddetector.FraudDetector(
    entity_type=ENTITY_TYPE,
    event_type=EVENT_TYPE,
    detector_name=DETECTOR_NAME,
    model_name=MODEL_NAME,
    model_version=MODEL_VERSION,
    model_type=MODEL_TYPE, 
    region=REGION,
    detector_version=DETECTOR_VERSION)

Next, train a model using the Fraud Detector SDK fit() method. This takes five parameters:

data_schema – the data_schema that is provided by the Profiler
data_location – the location of the training data, which needs to be located in an Amazon S3 bucket that is accessible to the Amazon Fraud Detector instance
role – the ARN of the role to execute the Amazon Fraud Detector model build operation, which we created in the IAM setup
variables – the data variable structure for the model, as provided by the profiler
labels – the labels structure for the model, as provided by the profiler

ROLE_ARN = "arn:aws:iam::9999999999:role/MyRoleWithAmazonFraudDetectorFullAccessPolicy"
detector.fit(data_schema=data_schema,
             data_location="s3://" + INPUT_BUCKET + "/training/registration_data_20K_minimum.csv",
             role=ROLE_ARN,
             variables=variables,
             labels=labels)

Check the status of the model training

The progress of the model training can be checked in the AWS console, or by calling:

# get the model status - should be TRAINING_COMPLETE before starting compile stage.
print(detector.model_status)

The example in this blog took about 1 hour to create in a test account with the sample registrations training data set referred to in this post.

Compile your model

You now have a trained model in Amazon Fraud Detector. When going into the AWS console you will see it including the version under Models:
Screenshot of AWS console showing the trained model
In order to deploy one of your models—we will use version 1.0—you need to define outcomes of your model first:

outcomes = [
    ("review_outcome", "Start a review process workflow"),
    ("verify_outcome", "Sideline event for review"),
    ("approve_outcome", "Approve the event")
]

We used three different outcomes: approve, verify, and review. If your model is certain that the transaction is fine, then it will approve. In the other two cases, a human review process will be triggered. With these outcomes, we can now activate the model, which means deploying it. Once it is deployed in Amazon Fraud Detector, we will attach it to a detector. To compile run:

detector.activate(outcomes_list=outcomes)

Deploy your model

Finally, once your model is compiled and ready to use, we need to attach it to a detector. This means creating prediction rules, which will determine the decisions made by the service with a specific prediction, and will also associate our model with a detector. This can be done by:

# create a list of rules that map model-scores to outcomes
rules = [{'ruleId': 'high_fraud_risk',
                       'expression': '$registration_model_insightscore &gt; 900',
                       'outcomes': ['verify_outcome']
                      },
        {'ruleId': 'low_fraud_risk',
                       'expression': '$registration_model_insightscore &lt;= 900 and $registration_model_insightscore &gt; 700',
                       'outcomes': ['review_outcome']
                      },
        {'ruleId': 'no_fraud_risk',
                       'expression': '$registration_model_insightscore &lt;= 700',
                       'outcomes': ['approve_outcome']
                      } 
        ]

# deploy the Fraud Detector model
response = detector.deploy(rules_list=rules)

Learn more about how to define the rules expressions here.

In the example, the rule variable $registration_model_insightscore is derived by Amazon Fraud Detector by combining the model name registration_model with the default suffix _insightscore . To choose the appropriate values for defining rule decision boundaries, check the model’s metrics in the Amazon Fraud Detector console in the Model Performance view. This allows you to experiment with different values to see the estimated false and true positive rates at this threshold as illustrated below:
Charts showing the metrics for the model.

And with that, we are ready to make predictions.

Make Predictions

Now, that we have a fully functional model, we also want to make predictions with it. The SDK for Python provides two different predict functions. If you have a single event to predict, then the SDK call could look like this:

detector.predict(
    event_timestamp='2021-11-13T12:18:21Z',
    event_variables={
            'email_address' : 'johndoe@exampledomain.com',
            'ip_address' : '82.24.61.42',
        }
)

However, sometimes you have a full list of observations or an entire Pandas DataFrame. A batch_predict method lets you send in your DataFrame, and you will get back a list of predictions:

detector.batch_predict(
    events=my_data_frame,
    timestamp="EVENT_TIMESTAMP"
)

The timestamp variable stands for the column of your DataFrame that contains the corresponding timestamp value. As Amazon Fraud Detector is using ISO-8601 format, this column will be converted for you by the SDK for Python itself.

Now, it is your turn! You can use the SDK in a Jupyter notebook, but you can also use it in your MLOps pipeline. For instance, wrap the SDK for Python into a Docker container and host this in a AWS Lambda function.

Cleanup

Please stop the model after you are done and delete all created resources in Amazon Fraud Detector and Amazon SageMaker if you are not using them anymore. There is a Destroy resources section in the example Jupyter notebook.

Summary

After reading this blog, you can build, train, and deploy your first Amazon Fraud Detector model using the open-source SDK for Python. You can now start using the SDK to train more models within a detector by starting another training job with a new model version. The SDK also lets you update your entities and event types within Amazon Fraud Detector and then train a new version of your model with the new data. Multiple model versions can be used alongside each other. The Python SDK simplifies development by offering familiar methods to Machine Learning practitioners like .fit(), .compile(), and .deploy(), and provides additional functionality to streamline your end-to-end workflow with Amazon Fraud Detector.

Further References

https://pypi.org/project/frauddetector/
https://github.com/aws-samples/amazon-fraud-detector-python-sdk

AWS Open Source Blog