AWS Open Source Blog
Build, train, and deploy Amazon Fraud Detector models using the open source Python SDK
Companies providing digital services are looking for ways to effectively identify fraudulent activities, such as online payment fraud and fake account creation. Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and builds on 20 years of fraud detection expertise from Amazon Web Services (AWS) and Amazon.com to automatically identify potentially fraudulent activity. In order to enable machine learning operations (MLOps) and orchestrate the build, train and deploy process we present an open-source AWS SDK for Python. This library is meant to help with rapid prototyping on a programmatic level. But it also supports programmatic deployment, model re-training, and batch prediction.
The latter is especially of interest when dealing with tabular data that is used for inference. You can use the SDK for Python in any runtime environments with Python 3. Since it is a Python library downloadable from PyPi, you can also use it in a Docker container when orchestrating your MLOps pipeline. In this blog post, we will show you a step-by-step guide for using the Amazon Fraud Detector open-source Python SDK for Python.
Prerequisites
Set IAM permissions
In our example, we are using Amazon SageMaker to work with the open-source Python SDK. However, the below code will work on any machine and computer. Make sure that the user you are signed in with has the access rights to Amazon Fraud Detector, as shown below for SageMaker. To use the Amazon Fraud Detector Python SDK for Python from a SageMaker notebook, you first need to grant the SageMaker notebook the permissions for calling Amazon Fraud Detector APIs.
We assume that you have already created an Amazon SageMaker notebook instance. The instance is automatically associated with an AWS Identity and Access Management (IAM) execution role. In order to find the role that is attached to your instance, you can click on the instance name in the SageMaker console. Scroll down on the next screen to Permissions and encryption. You can identify the role as the hyperlink that brings you to the IAM console:
Attach the Amazon Fraud Detector service role with full access to the service. Once you click on the above role and open a new tab, click on Attach policies on the left side of the screen. Once you see all policies listed, select AmazonFraudDetectorFullAccessPolicy
and click Attach policy on the bottom right.
You are now ready to use the SDK for Python on your Amazon SageMaker notebook instance.
Getting started with the SDK
The open-source SDK for Python is built to help you on your Amazon Fraud Detector journey and introduces some functionalities that support working with the service. For example, it can create and push a manifest file or check your image for compliance with the service limits.
Before we continue with the setup, it’s important to understand the service from a high level. The service expects tabular data. This data can vary from use case to use case, but it can have names, email addresses, IP addresses, and other fraud related columns saved, as seen in this example Jupyter notebook.
If you are developing on your local computer or any instance other than the SageMaker environment we are using you can copy the below code, or review this example notebook:
We will use these variables as:
INPUT_BUCKET
: the Amazon Simple Storage Service (Amazon S3) bucket that contains your tabular data to train a modelDETECTOR_NAME
: the unique name of the Amazon Fraud Detector detectorMODEL_NAME
: the unique name of the Amazon Fraud Detector modelENTITY_TYPE
: the unique name of the Amazon Fraud Detector entity storedEVENT_TYPE
: the unique name of the Amazon Fraud Detector event typeMODEL_TYPE
: the model typeMODEL_VERSION
: the model version you want to deploy (note: when starting fresh, “1” is the default)DETECTOR_VERSION
: the detector version you want to deploy (note: when starting fresh, “1” is the default)
Lastly, you need to install the SDK for Python. You can do this via pip install. Simply use:
in your Jupyter notebook or delete the exclamation point to use in your terminal. You are now all set to get started building your model.
Build an Amazon Fraud Detector model
In this section, we will walk you through the process of building a model. Before we start using the SDK for Python, stage the training data to build the model from an Amazon Simple Storage Service (Amazon S3) bucket. This example uses sample data shown below:
Stage the data
Stage the training data to build the model from an Amazon S3 bucket. Unzip the sample data, and copy the file registration_data_20K_minimum.csv
into an Amazon S3 bucket, which the SDK environment can access to load the data. This location should be stored in the INPUT_BUCKET
variable configured earlier.
Imports
Import the Fraud Detector SDK and the data profiler:
from frauddetector import frauddetector, profiler
Profile the data
The Amazon Fraud Detector SDK for Python can automatically profile the data to derive the correct input format for initializing an Amazon Fraud Detector instance. The following data structures are returned by the Amazon Fraud Detector profiler get_frauddetector_inputs()
utility:
- Amazon Fraud Detector
Labels
: these are the values of the labels used to label a row-event as aFRAUD
orLEGIT
(non-fraud) event in theEVENT_LABEL
field. - Amazon Fraud Detector
Variables
: This is a list of definitions for themodelVariables
defined in the data schema, providing the Amazon Fraud DetectorvariableType
anddatatype
as described in the Amazon Fraud Detector documentation. - Amazon Fraud Detector
Data Schema
: this is a JSON structure that defines the field-names of the input data and maps values in theEVENT_LABEL
field toFRAUD
orLEGIT
classification.
The data profiler generates this output based on a Pandas data-frame that is passed into it:
# imports for loading Pandas data-frame
import pandas as pd
import boto3, io
# instantiate a FraudDetector profiler
profiler = profiler.Profiler()
df = pd.read_csv(
"training_data/registration_data_20K_minimum.csv")
data_schema, variables, labels = profiler.get_frauddetector_inputs(data=df)
The output should look similar to:
Train a model
First, instantiate the fraud detector SDK instance, specifying the following attributes:
entity_type
– an entity that represents who is performing the event, such as a new userevent_type
– a business activity that is evaluated for fraud risk, such as a user registrationdetector_name
– the name of this Amazon Fraud Detector projectmodel_name
– name of the model to createmodel_version
– for tracking new versions of the modelmodel_type
– valid values: “ONLINE_FRAUD_INSIGHTS
” or “TRANSACTION_FRAUD_INSIGHTS
” (learn more on how to choose a model type)region
– the AWS region where the Amazon Fraud Detector should be deployeddetector_version
– version for tracking versions of this Amazon Fraud Detector project
To set up your FraudDetector
object, you can follow this example:
detector = frauddetector.FraudDetector(
entity_type=ENTITY_TYPE,
event_type=EVENT_TYPE,
detector_name=DETECTOR_NAME,
model_name=MODEL_NAME,
model_version=MODEL_VERSION,
model_type=MODEL_TYPE,
region=REGION,
detector_version=DETECTOR_VERSION)
Next, train a model using the Fraud Detector SDK fit()
method. This takes five parameters:
- data_schema – the data_schema that is provided by the Profiler
- data_location – the location of the training data, which needs to be located in an Amazon S3 bucket that is accessible to the Amazon Fraud Detector instance
- role – the ARN of the role to execute the Amazon Fraud Detector model build operation, which we created in the IAM setup
- variables – the data variable structure for the model, as provided by the profiler
- labels – the labels structure for the model, as provided by the profiler
ROLE_ARN = "arn:aws:iam::9999999999:role/MyRoleWithAmazonFraudDetectorFullAccessPolicy"
detector.fit(data_schema=data_schema,
data_location="s3://" + INPUT_BUCKET + "/training/registration_data_20K_minimum.csv",
role=ROLE_ARN,
variables=variables,
labels=labels)
Check the status of the model training
The progress of the model training can be checked in the AWS console, or by calling:
# get the model status - should be TRAINING_COMPLETE before starting compile stage.
print(detector.model_status)
The example in this blog took about 1 hour to create in a test account with the sample registrations training data set referred to in this post.
Compile your model
You now have a trained model in Amazon Fraud Detector. When going into the AWS console you will see it including the version under Models:
In order to deploy one of your models—we will use version 1.0—you need to define outcomes of your model first:
outcomes = [
("review_outcome", "Start a review process workflow"),
("verify_outcome", "Sideline event for review"),
("approve_outcome", "Approve the event")
]
We used three different outcomes: approve, verify, and review. If your model is certain that the transaction is fine, then it will approve. In the other two cases, a human review process will be triggered. With these outcomes, we can now activate the model, which means deploying it. Once it is deployed in Amazon Fraud Detector, we will attach it to a detector. To compile run:
detector.activate(outcomes_list=outcomes)
Deploy your model
Finally, once your model is compiled and ready to use, we need to attach it to a detector. This means creating prediction rules, which will determine the decisions made by the service with a specific prediction, and will also associate our model with a detector. This can be done by:
# create a list of rules that map model-scores to outcomes
rules = [{'ruleId': 'high_fraud_risk',
'expression': '$registration_model_insightscore > 900',
'outcomes': ['verify_outcome']
},
{'ruleId': 'low_fraud_risk',
'expression': '$registration_model_insightscore <= 900 and $registration_model_insightscore > 700',
'outcomes': ['review_outcome']
},
{'ruleId': 'no_fraud_risk',
'expression': '$registration_model_insightscore <= 700',
'outcomes': ['approve_outcome']
}
]
# deploy the Fraud Detector model
response = detector.deploy(rules_list=rules)
Learn more about how to define the rules expressions here.
In the example, the rule variable $registration_model_insightscore
is derived by Amazon Fraud Detector by combining the model name registration_model
with the default suffix _insightscore
. To choose the appropriate values for defining rule decision boundaries, check the model’s metrics in the Amazon Fraud Detector console in the Model Performance view. This allows you to experiment with different values to see the estimated false and true positive rates at this threshold as illustrated below:
And with that, we are ready to make predictions.
Make Predictions
Now, that we have a fully functional model, we also want to make predictions with it. The SDK for Python provides two different predict
functions. If you have a single event to predict, then the SDK call could look like this:
detector.predict(
event_timestamp='2021-11-13T12:18:21Z',
event_variables={
'email_address' : 'johndoe@exampledomain.com',
'ip_address' : '82.24.61.42',
}
)
However, sometimes you have a full list of observations or an entire Pandas DataFrame. A batch_predict
method lets you send in your DataFrame, and you will get back a list of predictions:
detector.batch_predict(
events=my_data_frame,
timestamp="EVENT_TIMESTAMP"
)
The timestamp
variable stands for the column of your DataFrame that contains the corresponding timestamp value. As Amazon Fraud Detector is using ISO-8601 format, this column will be converted for you by the SDK for Python itself.
Now, it is your turn! You can use the SDK in a Jupyter notebook, but you can also use it in your MLOps pipeline. For instance, wrap the SDK for Python into a Docker container and host this in a AWS Lambda function.
Cleanup
Please stop the model after you are done and delete all created resources in Amazon Fraud Detector and Amazon SageMaker if you are not using them anymore. There is a Destroy resources section in the example Jupyter notebook.
Summary
After reading this blog, you can build, train, and deploy your first Amazon Fraud Detector model using the open-source SDK for Python. You can now start using the SDK to train more models within a detector by starting another training job with a new model version. The SDK also lets you update your entities and event types within Amazon Fraud Detector and then train a new version of your model with the new data. Multiple model versions can be used alongside each other. The Python SDK simplifies development by offering familiar methods to Machine Learning practitioners like .fit()
, .compile()
, and .deploy()
, and provides additional functionality to streamline your end-to-end workflow with Amazon Fraud Detector.
Further References
https://pypi.org/project/frauddetector/
https://github.com/aws-samples/amazon-fraud-detector-python-sdk