亚马逊AWS官方博客

如何根据 Macie 检测结果为 S3 自动打上自定义敏感数据标签?

内容简介

在完成对敏感数据发现任务后,根据分类分级的定义,对数据打标签是企业数据管理过程中的首要一步。

Amazon Macie 是一项数据安全服务,它使用机器学习和规则匹配来发现敏感数据,实现数据可见性,以便自动防范数据安全风险。很多客户在使用 Macie 发现 S3 中的敏感数据后,希望可以自动地为 S3 中的 Object 打上敏感数据标签,并且希望标签是由企业自己定义的,而不是由 Macie 自带的 Severity 定义高(High)、中(Medium)、低(Low)

本篇博文将介绍一种支持自定义标签内容并根据 Macie 发现结果自动为 S3 打标签的方案,并提供自动部署的 CloudFormation 模板以及 CLI 示例命令行。

先决条件

架构与工作原理

在本方案中,当使用 Amazon Macie 执行完成敏感数据发现任务后,其扫描结果会自动传递至 Eventbridge,通过建立一条规则(rule),触发 Lambda 对 S3 中的文件进行打标签的操作,整体流程如下图:

方案架构图

本例中提供的模板是将敏感数据标签分为四个级别,详细定义及说明请见下文。

部署方法

请将附录中的两份模板文件复制保存至 CLI 命令行的本地运行目录下,完成下列参数的设置:

  • tagkey:自定义数据标签的名称(key)
  • level0 level1 level2 level3: 四个级别标签的内容(value),0-3 依次为从低到高
  • s3filepath:
    需要一个 mapping.json 文件来定义敏感数据类型与标签级别的对应关系,并将其放在 S3 上,mapping.json 文件请于附录中复制并保存。根据企业的定义,将每行 value 中默认的 0 改为对应的敏感级别,例如如果认为 ADDRESS 居住地址是 level2 级别的信息,则将”ADDRESS”:后边的 0 修改为数字 2。Lambda 会根据此文件中的定义,选取敏感数据类型中级别最高的,为 Object 打上标签。例如,一份文件中同时包含敏感级别为 2 的 ADDRESS,以及级别为 3 的 BANK_ACCOUNT_NUMBER,Lambda 会为其打上 level3 对应的标签。
{
  "ADDRESS": 2,
  "AUSTRALIA_DRIVERS_LICENSE": 0,
  "AUSTRALIA_TAX_FILE_NUMBER": 0,
  "AUSTRIA_DRIVERS_LICENSE": 0,
  "AWS_CREDENTIALS": 0,
  "BANK_ACCOUNT_NUMBER": 0,
  "BELGIUM_DRIVERS_LICENSE": 0,
  • region:运行 CloudFormation 模板的 AWS 区域
  • stackname:CloudFormation stack 的名称
  • template:附录中提供的 CloudFormation stack 模板,请保存为 yaml 格式

请运行以下示例命令行来设定参数:

tagkey='敏感度标识'
level0='公开'
level1='内部'
level2='保密'
level3='机密'
s3filepath=mapping.json
region=us-east-1
stackname=MacieAutotag
template=blog-template.yaml

以下为创建 CloudFormation stack 的示例命令行:

aws cloudformation create-stack --stack-name $stackname --template-body file://$template \
--parameters  \
ParameterKey=level0,ParameterValue=$level0 \
ParameterKey=level1,ParameterValue=$level1 \
ParameterKey=level2,ParameterValue=$level2  \
ParameterKey=level3,ParameterValue=$level3 \
ParameterKey=tagkey,ParameterValue=$tagkey  \
ParameterKey=s3filepath,ParameterValue=$s3filepath \
--capabilities CAPABILITY_IAM \
--region=$region

Cloudformation stack 需要几分钟运行,完成后,请运行以下 CLI 示例命令,将 mapping.json 上传至新建立的 S3 中供 Lambda 使用:

aws s3 cp  $s3filepath s3://$(aws cloudformation --region $region describe-stacks --stack-name $stackname --query 'Stacks[*].Outputs[0].OutputValue' --output text)/ --region=$region

登录 AWS 控制台,打开新创建的 lambda function->Configuration->Environment variables,可以看到我们之前定义的标签信息已经体现在这里,Lambda 会根据定义,针对 Macie 的发现结果给 S3 中的 Object 打上对应的标签。

lambda 的环境变量

结果展示

启动一次 Macie 扫描任务(Job),完成后,可以在 AWS 控制台查看 Lambda 的 Cloudwatch group 中的运行记录,如下图所示,Lambda 根据 Macie 的发现结果 CREDIT_CARD_NUMBER,为文件 1-financial-data.txt 打上了三级对应的标签:机密。

Lambda 的执行记录

查看 S3 中的文件,点选 Properties->Tags,可以发现已经被打上标签。

S3 中 Object 被打上的标签

小结

在这篇博文中,我向您展示了一种自定义敏感数据标签并由 Macie 扫描结果自动触发打标签的方法,非常简单方便。需要说明的是,本文示例中定义的是四个级别的标签,您可以根据企业的需求改为三级或者五级,只需要对 Lambda 的环境变量进行修改。模板 mapping.json 中列出的是 Macie 目前所有 Managed Data Identifiers(MDI),如果您使用了 Custom Data Identifiers (CDI),只需将 CDI 名称加进去即可。本文中的示例仅演示了部署在一个 region,您可以使用 AWS 的 CloudFormation StackSets ,将模板部署在多个 AWS Account 以及多个 regions 中。

附录

Mapping.json

{
  "ADDRESS": 2,
  "AUSTRALIA_DRIVERS_LICENSE": 0,
  "AUSTRALIA_TAX_FILE_NUMBER": 0,
  "AUSTRIA_DRIVERS_LICENSE": 0,
  "AWS_CREDENTIALS": 0,
  "BANK_ACCOUNT_NUMBER": 0,
  "BELGIUM_DRIVERS_LICENSE": 0,
  "BRAZIL_CEP_CODE": 0,
  "BRAZIL_CNPJ_NUMBER": 0,
  "BRAZIL_CPF_NUMBER": 0,
  "BRAZIL_PHONE_NUMBER": 0,
  "BRAZIL_RG_NUMBER": 0,
  "BULGARIA_DRIVERS_LICENSE": 0,
  "CANADA_DRIVERS_LICENSE": 0,
  "CANADA_HEALTH_NUMBER": 0,
  "CANADA_NATIONAL_IDENTIFICATION_NUMBER": 0,
  "CANADA_PASSPORT_NUMBER": 0,
  "CANADA_SOCIAL_INSURANCE_NUMBER": 0,
  "CREDIT_CARD_EXPIRATION": 3,
  "CREDIT_CARD_MAGNETIC_STRIPE": 0,
  "CREDIT_CARD_NUMBER": 3,
  "CREDIT_CARD_NUMBER_(NO_KEYWORD)": 3,
  "CREDIT_CARD_SECURITY_CODE": 3,
  "CROATIA_DRIVERS_LICENSE": 0,
  "CYPRUS_DRIVERS_LICENSE": 0,
  "CZECHIA_DRIVERS_LICENSE": 0,
  "DATE_OF_BIRTH": 0,
  "DENMARK_DRIVERS_LICENSE": 0,
  "DRIVERS_LICENSE": 0,
  "ESTONIA_DRIVERS_LICENSE": 0,
  "EUROPEAN_HEALTH_INSURANCE_CARD_NUMBER": 0,
  "FINLAND_DRIVERS_LICENSE": 0,
  "FINLAND_EUROPEAN_HEALTH_INSURANCE_NUMBER": 0,
  "FRANCE_BANK_ACCOUNT_NUMBER": 0,
  "FRANCE_DRIVERS_LICENSE": 0,
  "FRANCE_HEALTH_INSURANCE_NUMBER": 0,
  "FRANCE_NATIONAL_IDENTIFICATION_NUMBER": 0,
  "FRANCE_PASSPORT_NUMBER": 0,
  "FRANCE_PHONE_NUMBER": 0,
  "FRANCE_TAX_IDENTIFICATION_NUMBER": 0,
  "GERMANY_BANK_ACCOUNT_NUMBER": 0,
  "GERMANY_DRIVERS_LICENSE": 0,
  "GERMANY_NATIONAL_IDENTIFICATION_NUMBER": 0,
  "GERMANY_PASSPORT_NUMBER": 0,
  "GERMANY_PHONE_NUMBER": 0,
  "GERMANY_TAX_IDENTIFICATION_NUMBER": 0,
  "GREECE_DRIVERS_LICENSE": 0,
  "HTTP_BASIC_AUTH_HEADER": 0,
  "HTTP_COOKIE": 0,
  "HUNGARY_DRIVERS_LICENSE": 0,
  "IRELAND_DRIVERS_LICENSE": 0,
  "ITALY_BANK_ACCOUNT_NUMBER": 0,
  "ITALY_DRIVERS_LICENSE": 0,
  "ITALY_NATIONAL_IDENTIFICATION_NUMBER": 0,
  "ITALY_PASSPORT_NUMBER": 0,
  "ITALY_PHONE_NUMBER": 0,
  "JSON_WEB_TOKEN": 0,
  "LATITUDE_LONGITUDE": 0,
  "LATVIA_DRIVERS_LICENSE": 0,
  "LITHUANIA_DRIVERS_LICENSE": 0,
  "LUXEMBOURG_DRIVERS_LICENSE": 0,
  "MALTA_DRIVERS_LICENSE": 0,
  "MEDICAL_DEVICE_UDI": 0,
  "NAME": 3,
  "NETHERLANDS_DRIVERS_LICENSE": 0,
  "OPENSSH_PRIVATE_KEY": 0,
  "PGP_PRIVATE_KEY": 0,
  "PHONE_NUMBER": 0,
  "PKCS": 0,
  "POLAND_DRIVERS_LICENSE": 0,
  "PORTUGAL_DRIVERS_LICENSE": 0,
  "PUTTY_PRIVATE_KEY": 0,
  "ROMANIA_DRIVERS_LICENSE": 0,
  "SLOVAKIA_DRIVERS_LICENSE": 0,
  "SLOVENIA_DRIVERS_LICENSE": 0,
  "SPAIN_BANK_ACCOUNT_NUMBER": 0,
  "SPAIN_DNI_NUMBER": 0,
  "SPAIN_DRIVERS_LICENSE": 0,
  "SPAIN_NIE_NUMBER": 0,
  "SPAIN_NIF_NUMBER": 0,
  "SPAIN_PASSPORT_NUMBER": 0,
  "SPAIN_PHONE_NUMBER": 0,
  "SPAIN_SOCIAL_SECURITY_NUMBER": 0,
  "SPAIN_TAX_IDENTIFICATION_NUMBER": 0,
  "SWEDEN_DRIVERS_LICENSE": 0,
  "UK_BANK_ACCOUNT_NUMBER": 0,
  "UK_DRIVERS_LICENSE": 0,
  "UK_ELECTORAL_ROLL_NUMBER": 0,
  "UK_NATIONAL_INSURANCE_NUMBER": 0,
  "UK_NHS_NUMBER": 0,
  "UK_PASSPORT_NUMBER": 0,
  "UK_PHONE_NUMBER": 0,
  "UK_TAX_IDENTIFICATION_NUMBER": 0,
  "USA_HEALTHCARE_PROCEDURE_CODE": 0,
  "USA_HEALTH_INSURANCE_CLAIM_NUMBER": 0,
  "USA_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER": 0,
  "USA_MEDICARE_BENEFICIARY_IDENTIFIER": 0,
  "USA_NATIONAL_DRUG_CODE": 0,
  "USA_NATIONAL_PROVIDER_IDENTIFIER": 0,
  "USA_PASSPORT_NUMBER": 2,
  "USA_SOCIAL_SECURITY_NUMBER": 1,
  "US_DRUG_ENFORCEMENT_AGENCY_NUMBER": 0,
  "VEHICLE_IDENTIFICATION_NUMBER": 0,
  "ChineseID":4
}

Cloudformation template

AWSTemplateFormatVersion: 2010-09-09
Parameters:
  level0:
    Type: String
    Default: level0
    Description: the lowerest value of tag, e.g.public
  level1:
    Type: String
    Default: level1
    Description: the second lowerest value,e.g.internal
  level2:
    Type: String
    Default: level2
    Description: e.g sensitive
  level3:
    Type: String
    Default: level3
    Description: the Highest value of tag,e.g.Secret
  tagkey:
    Type: String
    Default: datalevel
    Description: the key of the tag,e.g datalabel
  s3filepath:
    Type: String
    Default: mapping.json
    Description: keyname of the json file which mapping macie Identifiers with your defined levels 

Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      AccessControl: BucketOwnerFullControl
      LifecycleConfiguration:
        Rules:
          -
            AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 3
            NoncurrentVersionExpirationInDays: 3
            Status: Enabled
     
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      Tags:
        -
          Key: demo
          Value: macie-auto-tag
      VersioningConfiguration:
        Status: Enabled
  EventRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: EventRule
      EventPattern:
        source:
          - aws.macie
        detail-type:
          - Macie Finding
        detail:
          category:
            - CLASSIFICATION
      State: ENABLED
      Targets:
        - Arn:
            'Fn::GetAtt':
              - LambdaFunction
              - Arn
          Id: '1'
  PermissionForEventsToInvokeLambda:
    Type: 'AWS::Lambda::Permission'
    Properties:
      FunctionName:
        Ref: LambdaFunction
      Action: 'lambda:InvokeFunction'
      Principal: events.amazonaws.com
      SourceArn:
        'Fn::GetAtt':
          - EventRule
          - Arn
  LambdaFunction:
    Type: 'AWS::Lambda::Function'
    Properties:
      Runtime: python3.9
      Role: !GetAtt IAMRole.Arn
      Handler: index.lambda_handler
      Timeout: 600
      Environment:
        Variables:
          level0: !Ref level0
          level1: !Ref level1
          level2: !Ref level2
          level3: !Ref level3
          tagkey: !Ref tagkey
          mappingbucket: !Ref S3Bucket
          mappingfile: !Ref s3filepath
      Code:
        ZipFile: |
          import json
          import boto3
          import os
          s3 = boto3.client('s3')
          typelist=[]
          #把环境变量中设置的级别取出 get all the values from enviromental paramter
          tagkey=os.environ['tagkey']
          mappingbucket=os.environ['mappingbucket']
          mappingfile=os.environ['mappingfile']
          dataclass=[]
          for i in range(4): #if your levels are 3 or 5 ,you have to change this number and also the enviroment parameters
              dataclass.append(os.environ[('level'+str(i))])
          #判断现有标签的级别是否存在 decide the current tag of object
          #read the mapping json file from s3,读取客户自定义类型的匹配json文件
          def getmapping(mappingbucket,mappingfile):
              s3 = boto3.resource('s3')
              obj = s3.Object(mappingbucket, mappingfile)
              data = json.load(obj.get()['Body']) 
              #print(data)
              return(data)
          datadic=getmapping(mappingbucket,mappingfile)
          #print(datadic)
          #标签定级分析,需要提前预置对应关系,并对原有标签进行比较,取较高者,choose the highest level tag   
          def currenttag(filetag): 
              oldlevel=0
              for each in filetag:
                  if each['Key']==tagkey:
                      oldvalue=each['Value']
                      if oldvalue in dataclass:
                          oldlevel=dataclass.index(oldvalue)
                      else:
                          print(oldvalue,' is not defined please check error, will treate it as none')
                          oldlevel=0
                      filetag.remove(each)
              return(oldlevel,filetag)
          def taglevel(newtypelist,oldlevel):
              levels=[]
              for each in newtypelist:
                  print('sensitive data type: ',each)
                  if (each in datadic.keys()):
                      levels.append(datadic[each])
                  else:
                      print(each,' can not find its level data ,please update your mapping json!')
              print("found these sensitive level:{}".format(levels))
              return(max(levels))        
          def gettag(s3name,filename):
              response = s3.get_object_tagging(
              Bucket=s3name,
              Key=filename)
              return(response["TagSet"])
          def tag(s3name,filename,filetag):
              response = s3.put_object_tagging(
                  Bucket=s3name,
                  Key=filename,
                  Tagging={
                      'TagSet': filetag
                  }
              )
              return response
          def lambda_handler(event, context):
              #根据event类型,获得S3相关信息 get S3 information based on event type,macie or securityhub
              sourcetype=event["detail-type"]
              if sourcetype=="Macie Finding":# macie event ,all lower case letters
                  accountid = event["detail"]["accountId"]
                  region = event["detail"]["region"]
                  s3name = event["detail"]["resourcesAffected"]["s3Bucket"]["name"]
                  filename = event["detail"]["resourcesAffected"]["s3Object"]["key"]
                  #filetag = event["detail"]["resourcesAffected"]["s3Object"]["tags"] # drop it because macie fining's tag is lowercase
                  customrule=event["detail"]["classificationDetails"]["result"]["customDataIdentifiers"]["detections"]
                  managedrule=event["detail"]["classificationDetails"]["result"]["sensitiveData"]
                  for i in range(len(customrule)):
                      typelist.append(customrule[i]["name"])
                  #managed rule结果获取
                  for i in range(len(managedrule)):
                      temp=managedrule[i]["detections"]
                      for each in temp:
                          typelist.append(each['type'])
              if sourcetype=="Security Hub Findings - Custom Action":# sechub finding use Capital letters
                  accountid = event["detail"]["findings"][0]["Resources"][0]["Details"]["AwsS3Bucket"]["OwnerAccountId"]
                  region = event["detail"]["findings"][0]["Resources"][0]["Region"]
                  #s3 = event["detail"]["findings"]["Resources"][0]["Id"]
                  #s3name=s3arn.split(':')[5]
                  s3info= event["detail"]["findings"][0]["ProductFields"]["S3Object.Path"]
                  s3name=s3info.split('/')[0]
                  filename=s3info[s3info.find('/')+1:]
                  customrule=event["detail"]["findings"][0]["Resources"][1]["DataClassification"]["Result"]["CustomDataIdentifiers"]["Detections"]
                  managedrule=event["detail"]["findings"][0]["Resources"][1]["DataClassification"]["Result"]["SensitiveData"]
                  for i in range(len(customrule)):
                      typelist.append(customrule[i]["Name"])
                  for i in range(len(managedrule)):
                      temp=managedrule[i]["Detections"]
                      for each in temp:
                          typelist.append(each['Type'])
              newtypelist=list(set(typelist))       
              #print("scanned sensitive data types : {}".format(newtypelist))    
              filetag = gettag(s3name,filename)
              #print(filetag)
              #return(tag(s3name,filename,filetag))
          #判断需要打上的标签级别,原来的与新扫描出的结果,取最高.decide the new level of tag,choose the highest
              oldlevel=currenttag(filetag)[0]
              filetag=currenttag(filetag)[1]
              print('原有标签级别为:'+str(oldlevel)+'\n The Old tag level is:'+str(oldlevel))
              valuelevel=taglevel(newtypelist,oldlevel)
              print('新标签级别为:'+str(valuelevel)+'\n The New tag level is:'+str(valuelevel))
              #print(len(dataclass))
              if valuelevel < len(dataclass):
                  value=dataclass[valuelevel]
                  if oldlevel==0:
                      filetag.append({'Key':tagkey,'Value':value})
                  #为S3中的obj打上对应的标签 tagging object in s3
                      print("Lambda is Tagging your Object: {0} \n which is in S3 bucket {1} with \n tag level :{2}  \n tag name:{3} ".format(filename,s3name,valuelevel,value))
                      result=tag(s3name,filename,filetag)
                      return (result)
              else:
                  print("undefined taggning label for scanned type {}, please check your mapping file!".format(typelist))
      Description: detect macie finding to auto tag s3 object
      TracingConfig:
        Mode: Active
  IAMRole:
    Type: 'AWS::IAM::Role'
    Properties:
      Description: basic lambda role plus s3 putobjtag
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Policies:
        - PolicyName: macie-eb-s3-lambda-policy
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - 's3:GetObjectTagging'
                  - 's3:GetObject'
                  - 's3:PutObjectTagging'
                  - 'logs:CreateLogGroup'
                  - 'logs:CreateLogStream'
                  - 'logs:PutLogEvents'
                Resource:
                  - '*'

Outputs:
  S3BucketName:
    Value: !Ref S3Bucket
    Description: S3 Bucket Name which to upload mapping.json

本篇作者

Jessica Wang

亚马逊云科技专业服务团队高级安全顾问,负责为客户提供云安全咨询、架构设计和技术实施等服务。

李潇翌

亚马逊云科技专业服务团队安全顾问,有多年的安全从业经验,负责亚马逊云上安全架构的设计以及实施。主要专注于网络安全、云上安全以及反“黑灰产”例如“薅羊毛”等领域。