AWS Machine Learning Blog

Creating Amazon SageMaker Studio domains and user profiles using AWS CloudFormation

February 2021 Update: Customers can now use native AWS CloudFormation code templates to model the infrastructure set up for Amazon SageMaker Studio and configure its access for users in their organizations at scale. For more information, please see the announcement post

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). It provides a single, web-based visual interface where you can perform all ML development steps required to build, train, tune, debug, deploy, and monitor models. In this post, we demonstrate how you can create a SageMaker Studio domain and user profile using AWS CloudFormation. AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycle by treating infrastructure as code.

Because AWS CloudFormation isn’t natively integrated with SageMaker Studio at the time of this writing, we use AWS CloudFormation to provision two AWS Lambda functions and then invoke these functions to create, delete, and update the Studio domain and user profile. In the rest of this post, we walk through the Lambda function to create the Studio domain (the code for creating a Studio user profile works similarly) and then the CloudFormation template. All the code is available in the GitHub repo.

Lambda function for creating, deleting, and updating a Studio domain

In the Lambda function, the lambda_handler calls one of the three functions, handle_create, handle_update, and handle_delete, to create, update, and delete the Studio domain, respectively. Because we invoke this function using an AWS CloudFormation custom resource, the custom resource request type is sent in the RequestType field from AWS CloudFormation. RequestType determines which function to call inside the lambda_handler function. For example, when AWS CloudFormation detects any changes in the custom::StudioDomain section of our CloudFormation template, the RequestType is set to Update by AWS CloudFormation, and the handle_update function is called. The following is the lambda_handler code:

def lambda_handler(event, context):
    try:
        if event['RequestType'] == 'Create':
            handle_create(event, context)
        elif event['RequestType'] == 'Update':
            handle_update(event, context)
        elif event['RequestType'] == 'Delete':
            handle_delete(event, context)
    except ClientError as exception:
        logging.error(exception)
        cfnresponse.send(event, context, cfnresponse.FAILED,
                         {}, error=str(exception))

The three functions for creating, updating, and deleting the domain work similarly. For this post, we walk through the code responsible for creating a domain. When invoking the Lambda function through an AWS CloudFormation custom resource, we pass key parameters that help define our Studio domain via the custom resource Properties. We extract these parameters from the AWS CloudFormation event source in the Lambda function. In the handle_create function, parameters are read in from the event and passed on to the create_studio_domain function. See the following code for handle_create:

def handle_create(event, context):
    print("**Starting running the SageMaker workshop setup code")
    resource_config = event['ResourceProperties']
    print("**Creating studio domain")
    response_data = create_studio_domain(resource_config)
    cfnresponse.send(event, context, cfnresponse.SUCCESS,
                     {}, physicalResourceId=response_data['DomainArn'])

We use a boto3 SageMaker client to create Studio domains. For this post, we set the domain name, the VPC and subnet that Studio uses, and the SageMaker execution role for the Studio domain. After the create_domain API is made, we check the creation status every 5 seconds. When creation is complete, we return the Amazon Resource Name (ARN) and the URL of the created domain. The amount of time that Lambda allows a function to run before stopping it is 3 seconds by default. Therefore, make sure that the timeout limit of your Lambda function is set appropriately. We set the timeout limit to 900 seconds. The following is the create_studio_domain code (the functions for deleting and updating domains are also implemented using boto3 and constructed in a similar fashion):

client = boto3.client('sagemaker')
def create_studio_domain(config):
    vpc_id = config['VPC']
    subnet_ids = config['SubnetIds']
    default_user_settings = config['DefaultUserSettings']
    domain_name = config['DomainName']

    response = client.create_domain(
        DomainName=domain_name,
        AuthMode='IAM',
        DefaultUserSettings=default_user_settings,
        SubnetIds=subnet_ids.split(','),
        VpcId=vpc_id
    )

    domain_id = response['DomainArn'].split('/')[-1]
    created = False
    while not created:
        response = client.describe_domain(DomainId=domain_id)
        time.sleep(5)
        if response['Status'] == 'InService':
            created = True

    logging.info("**SageMaker domain created successfully: %s", domain_id)
    return response

Finally, we zip the Python script, save it as domain_function.zip, and upload it to Amazon Simple Storage Service (Amazon S3).

The Lambda function used for creating a user profile is constructed similarly. For more information, see the UserProfile_function.py script in the GitHub repo.

CloudFormation template

In the CloudFormation template, we create an execution role for Lambda, an execution role for SageMaker Studio, and the Lambda function using the code explained in the previous section. We invoke this function by specifying it as the target of a customer resource. For more information about invoking a Lambda function with AWS CloudFormation, see Using AWS Lambda with AWS CloudFormation.

Lambda execution role

This role gives our Lambda function the permission to create an Amazon CloudWatch Logs stream and write logs to CloudWatch. Because we create, delete, and update Studio domains in our function, we also grant this role the permission to do so. See the following code:

LambdaExecutionRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /

  LambdaExecutionPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Path: /
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: CloudWatchLogsPermissions
            Effect: Allow
            Action:
              - logs:CreateLogGroup
              - logs:CreateLogStream
              - logs:PutLogEvents
            Resource: !Sub "arn:${AWS::Partition}:logs:*:*:*"
          - Sid: SageMakerDomainPermission
            Effect: Allow
            Action:
              - sagemaker:CreateDomain
              - sagemaker:DescribeDomain
              - sagemaker:DeleteDomain
              - sagemaker:UpdateDomain
              - sagemaker:CreateUserProfile
              - sagemaker:UpdateUserProfile
              - sagemaker:DeleteUserProfile
              - sagemaker:DescribeUserProfile
            Resource:
              - !Sub "arn:${AWS::Partition}:sagemaker:*:*:domain/*"
              - !Sub "arn:${AWS::Partition}:sagemaker:*:*:user-profile/*"
          - Sid: SageMakerExecPassRole
            Effect: Allow
            Action:
              - iam:PassRole
            Resource: !GetAtt  SageMakerExecutionRole.Arn
      Roles:
        - !Ref LambdaExecutionRole

SageMaker execution role

The following SageMaker execution role is attached to Studio (for demonstration purposes, we grant this role SageMakerFullAccess):

SageMakerExecutionRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - sagemaker.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

Lambda function

The AWS::Lambda::Function resource creates a Lambda function. To create a function, we need a deployment package and an execution role. The deployment package contains our function code (function.zip). The execution role, which is the LambdaExecutionRole created from previous step, grants the function permission to create a Lambda function. We also added the CfnResponseLayer to our function’s execution environment. CfnResponseLayer enables the function to interact with an AWS CloudFormation custom resource. It contains a send method to send responses from Lambda to AWS CloudFormation. See the following code:

Resources:
...
 StudioDomainFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        S3Bucket: !Ref S3Bucket
        S3Key: function.zip
        S3ObjectVersion: !Ref S3ObjectVersion
      Runtime: python3.8
      Timeout: 900
      Layers:
        - !Ref CfnResponseLayer
       
  CfnResponseLayer:
    Type: AWS::Lambda::LayerVersion
    Properties:
      CompatibleRuntimes:
        - python3.8
      Content:
        S3Bucket: !Ref S3Bucket
        S3Key: cfnResponse-layer.zip
      Description: cfn-response layer
      LayerName: cfn-response

Invoking the Lambda function using an AWS CloudFormation custom resource

Custom resources provide a way for you to write custom provisioning logic in a CloudFormation template and have AWS CloudFormation run it during a stack operation, such as when you create, update, or delete a stack. For more information, see Custom resources. We get the Lambda function’s ARN created from previous step and pass it to AWS CloudFormation as our service token. This allows AWS CloudFormation to invoke the Lambda function. We pass parameters required for creating, updating, and deleting our domain under Properties. See the following code:

StudioDomain:
    Type: Custom::StudioDomain
    Properties:
      ServiceToken: !GetAtt StudioDomainFunction.Arn
      VPC: !Ref VPCId
      SubnetIds: !Ref SubnetIds
      DomainName: "MyDomainName"
      DefaultUserSettings:
        ExecutionRole: !GetAtt SageMakerExecutionRole.Arn

In the same fashion, we invoke the Lambda function for creating a user profile:

UserProfile:
    Type: Custom::UserProfile
    Properties:
      ServiceToken: !GetAtt UserProfileFunction.Arn
      DomainId: !GetAtt StudioDomain.DomainId
      UserProfileName: !Ref UserProfileName
      UserSettings:
        ExecutionRole: !GetAtt SageMakerExecutionRole.Arn

Conclusion

In this post, we walked through the steps of creating, deleting, and updating SageMaker Studio domains using AWS CloudFormation and Lambda. The sample files are available in the GitHub repo. For information about creating Studio domain inside a VPC, see Securing Amazon SageMaker Studio connectivity using a private VPC. For more information about SageMaker Studio, see Get Started with Amazon SageMaker Studio.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

 

 

Joseph Jegan is a Cloud Application Architect at Amazon Web Services. He helps AWS customers use AWS services to design scalable and secure applications. He has over 20 years of software development experience prior to AWS, working on developing e-commerce platform for large retail customers. He is based out of New York metro and enjoys learning emerging cloud native technologies.

 

 

David Ping is a Principal Machine Learning Solutions Architect and Sr. Manager of AI/ML Solutions Architecture at Amazon Web Services. He helps enterprise customers build and operate machine learning solutions on AWS. In his spare time, David enjoys hiking and reading the latest machine learning articles.