内容简介
在完成对敏感数据发现任务后,根据分类分级的定义,对数据打标签是企业数据管理过程中的首要一步。
Amazon Macie 是一项数据安全服务,它使用机器学习和规则匹配来发现敏感数据,实现数据可见性,以便自动防范数据安全风险。很多客户在使用 Macie 发现 S3 中的敏感数据后,希望可以自动地为 S3 中的 Object 打上敏感数据标签,并且希望标签是由企业自己定义的,而不是由 Macie 自带的 Severity 定义高(High)、中(Medium)、低(Low)。
本篇博文将介绍一种支持自定义标签内容并根据 Macie 发现结果自动为 S3 打标签的方案,并提供自动部署的 CloudFormation 模板以及 CLI 示例命令行。
先决条件
架构与工作原理
在本方案中,当使用 Amazon Macie 执行完成敏感数据发现任务后,其扫描结果会自动传递至 Eventbridge,通过建立一条规则(rule),触发 Lambda 对 S3 中的文件进行打标签的操作,整体流程如下图:
方案架构图
本例中提供的模板是将敏感数据标签分为四个级别,详细定义及说明请见下文。
部署方法
请将附录中的两份模板文件复制保存至 CLI 命令行的本地运行目录下,完成下列参数的设置:
- tagkey:自定义数据标签的名称(key)
- level0 level1 level2 level3: 四个级别标签的内容(value),0-3 依次为从低到高
- s3filepath:
需要一个 mapping.json 文件来定义敏感数据类型与标签级别的对应关系,并将其放在 S3 上,mapping.json 文件请于附录中复制并保存。根据企业的定义,将每行 value 中默认的 0 改为对应的敏感级别,例如如果认为 ADDRESS 居住地址是 level2 级别的信息,则将”ADDRESS”:后边的 0 修改为数字 2。Lambda 会根据此文件中的定义,选取敏感数据类型中级别最高的,为 Object 打上标签。例如,一份文件中同时包含敏感级别为 2 的 ADDRESS,以及级别为 3 的 BANK_ACCOUNT_NUMBER,Lambda 会为其打上 level3 对应的标签。
{
"ADDRESS": 2,
"AUSTRALIA_DRIVERS_LICENSE": 0,
"AUSTRALIA_TAX_FILE_NUMBER": 0,
"AUSTRIA_DRIVERS_LICENSE": 0,
"AWS_CREDENTIALS": 0,
"BANK_ACCOUNT_NUMBER": 0,
"BELGIUM_DRIVERS_LICENSE": 0,
- region:运行 CloudFormation 模板的 AWS 区域
- stackname:CloudFormation stack 的名称
- template:附录中提供的 CloudFormation stack 模板,请保存为 yaml 格式
请运行以下示例命令行来设定参数:
tagkey='敏感度标识'
level0='公开'
level1='内部'
level2='保密'
level3='机密'
s3filepath=mapping.json
region=us-east-1
stackname=MacieAutotag
template=blog-template.yaml
以下为创建 CloudFormation stack 的示例命令行:
aws cloudformation create-stack --stack-name $stackname --template-body file://$template \
--parameters \
ParameterKey=level0,ParameterValue=$level0 \
ParameterKey=level1,ParameterValue=$level1 \
ParameterKey=level2,ParameterValue=$level2 \
ParameterKey=level3,ParameterValue=$level3 \
ParameterKey=tagkey,ParameterValue=$tagkey \
ParameterKey=s3filepath,ParameterValue=$s3filepath \
--capabilities CAPABILITY_IAM \
--region=$region
Cloudformation stack 需要几分钟运行,完成后,请运行以下 CLI 示例命令,将 mapping.json 上传至新建立的 S3 中供 Lambda 使用:
aws s3 cp $s3filepath s3://$(aws cloudformation --region $region describe-stacks --stack-name $stackname --query 'Stacks[*].Outputs[0].OutputValue' --output text)/ --region=$region
登录 AWS 控制台,打开新创建的 lambda function->Configuration->Environment variables,可以看到我们之前定义的标签信息已经体现在这里,Lambda 会根据定义,针对 Macie 的发现结果给 S3 中的 Object 打上对应的标签。
lambda 的环境变量
结果展示
启动一次 Macie 扫描任务(Job),完成后,可以在 AWS 控制台查看 Lambda 的 Cloudwatch group 中的运行记录,如下图所示,Lambda 根据 Macie 的发现结果 CREDIT_CARD_NUMBER,为文件 1-financial-data.txt 打上了三级对应的标签:机密。
Lambda 的执行记录
查看 S3 中的文件,点选 Properties->Tags,可以发现已经被打上标签。
S3 中 Object 被打上的标签
小结
在这篇博文中,我向您展示了一种自定义敏感数据标签并由 Macie 扫描结果自动触发打标签的方法,非常简单方便。需要说明的是,本文示例中定义的是四个级别的标签,您可以根据企业的需求改为三级或者五级,只需要对 Lambda 的环境变量进行修改。模板 mapping.json 中列出的是 Macie 目前所有 Managed Data Identifiers(MDI),如果您使用了 Custom Data Identifiers (CDI),只需将 CDI 名称加进去即可。本文中的示例仅演示了部署在一个 region,您可以使用 AWS 的 CloudFormation StackSets ,将模板部署在多个 AWS Account 以及多个 regions 中。
附录
Mapping.json
{
"ADDRESS": 2,
"AUSTRALIA_DRIVERS_LICENSE": 0,
"AUSTRALIA_TAX_FILE_NUMBER": 0,
"AUSTRIA_DRIVERS_LICENSE": 0,
"AWS_CREDENTIALS": 0,
"BANK_ACCOUNT_NUMBER": 0,
"BELGIUM_DRIVERS_LICENSE": 0,
"BRAZIL_CEP_CODE": 0,
"BRAZIL_CNPJ_NUMBER": 0,
"BRAZIL_CPF_NUMBER": 0,
"BRAZIL_PHONE_NUMBER": 0,
"BRAZIL_RG_NUMBER": 0,
"BULGARIA_DRIVERS_LICENSE": 0,
"CANADA_DRIVERS_LICENSE": 0,
"CANADA_HEALTH_NUMBER": 0,
"CANADA_NATIONAL_IDENTIFICATION_NUMBER": 0,
"CANADA_PASSPORT_NUMBER": 0,
"CANADA_SOCIAL_INSURANCE_NUMBER": 0,
"CREDIT_CARD_EXPIRATION": 3,
"CREDIT_CARD_MAGNETIC_STRIPE": 0,
"CREDIT_CARD_NUMBER": 3,
"CREDIT_CARD_NUMBER_(NO_KEYWORD)": 3,
"CREDIT_CARD_SECURITY_CODE": 3,
"CROATIA_DRIVERS_LICENSE": 0,
"CYPRUS_DRIVERS_LICENSE": 0,
"CZECHIA_DRIVERS_LICENSE": 0,
"DATE_OF_BIRTH": 0,
"DENMARK_DRIVERS_LICENSE": 0,
"DRIVERS_LICENSE": 0,
"ESTONIA_DRIVERS_LICENSE": 0,
"EUROPEAN_HEALTH_INSURANCE_CARD_NUMBER": 0,
"FINLAND_DRIVERS_LICENSE": 0,
"FINLAND_EUROPEAN_HEALTH_INSURANCE_NUMBER": 0,
"FRANCE_BANK_ACCOUNT_NUMBER": 0,
"FRANCE_DRIVERS_LICENSE": 0,
"FRANCE_HEALTH_INSURANCE_NUMBER": 0,
"FRANCE_NATIONAL_IDENTIFICATION_NUMBER": 0,
"FRANCE_PASSPORT_NUMBER": 0,
"FRANCE_PHONE_NUMBER": 0,
"FRANCE_TAX_IDENTIFICATION_NUMBER": 0,
"GERMANY_BANK_ACCOUNT_NUMBER": 0,
"GERMANY_DRIVERS_LICENSE": 0,
"GERMANY_NATIONAL_IDENTIFICATION_NUMBER": 0,
"GERMANY_PASSPORT_NUMBER": 0,
"GERMANY_PHONE_NUMBER": 0,
"GERMANY_TAX_IDENTIFICATION_NUMBER": 0,
"GREECE_DRIVERS_LICENSE": 0,
"HTTP_BASIC_AUTH_HEADER": 0,
"HTTP_COOKIE": 0,
"HUNGARY_DRIVERS_LICENSE": 0,
"IRELAND_DRIVERS_LICENSE": 0,
"ITALY_BANK_ACCOUNT_NUMBER": 0,
"ITALY_DRIVERS_LICENSE": 0,
"ITALY_NATIONAL_IDENTIFICATION_NUMBER": 0,
"ITALY_PASSPORT_NUMBER": 0,
"ITALY_PHONE_NUMBER": 0,
"JSON_WEB_TOKEN": 0,
"LATITUDE_LONGITUDE": 0,
"LATVIA_DRIVERS_LICENSE": 0,
"LITHUANIA_DRIVERS_LICENSE": 0,
"LUXEMBOURG_DRIVERS_LICENSE": 0,
"MALTA_DRIVERS_LICENSE": 0,
"MEDICAL_DEVICE_UDI": 0,
"NAME": 3,
"NETHERLANDS_DRIVERS_LICENSE": 0,
"OPENSSH_PRIVATE_KEY": 0,
"PGP_PRIVATE_KEY": 0,
"PHONE_NUMBER": 0,
"PKCS": 0,
"POLAND_DRIVERS_LICENSE": 0,
"PORTUGAL_DRIVERS_LICENSE": 0,
"PUTTY_PRIVATE_KEY": 0,
"ROMANIA_DRIVERS_LICENSE": 0,
"SLOVAKIA_DRIVERS_LICENSE": 0,
"SLOVENIA_DRIVERS_LICENSE": 0,
"SPAIN_BANK_ACCOUNT_NUMBER": 0,
"SPAIN_DNI_NUMBER": 0,
"SPAIN_DRIVERS_LICENSE": 0,
"SPAIN_NIE_NUMBER": 0,
"SPAIN_NIF_NUMBER": 0,
"SPAIN_PASSPORT_NUMBER": 0,
"SPAIN_PHONE_NUMBER": 0,
"SPAIN_SOCIAL_SECURITY_NUMBER": 0,
"SPAIN_TAX_IDENTIFICATION_NUMBER": 0,
"SWEDEN_DRIVERS_LICENSE": 0,
"UK_BANK_ACCOUNT_NUMBER": 0,
"UK_DRIVERS_LICENSE": 0,
"UK_ELECTORAL_ROLL_NUMBER": 0,
"UK_NATIONAL_INSURANCE_NUMBER": 0,
"UK_NHS_NUMBER": 0,
"UK_PASSPORT_NUMBER": 0,
"UK_PHONE_NUMBER": 0,
"UK_TAX_IDENTIFICATION_NUMBER": 0,
"USA_HEALTHCARE_PROCEDURE_CODE": 0,
"USA_HEALTH_INSURANCE_CLAIM_NUMBER": 0,
"USA_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER": 0,
"USA_MEDICARE_BENEFICIARY_IDENTIFIER": 0,
"USA_NATIONAL_DRUG_CODE": 0,
"USA_NATIONAL_PROVIDER_IDENTIFIER": 0,
"USA_PASSPORT_NUMBER": 2,
"USA_SOCIAL_SECURITY_NUMBER": 1,
"US_DRUG_ENFORCEMENT_AGENCY_NUMBER": 0,
"VEHICLE_IDENTIFICATION_NUMBER": 0,
"ChineseID":4
}
Cloudformation template
AWSTemplateFormatVersion: 2010-09-09
Parameters:
level0:
Type: String
Default: level0
Description: the lowerest value of tag, e.g.public
level1:
Type: String
Default: level1
Description: the second lowerest value,e.g.internal
level2:
Type: String
Default: level2
Description: e.g sensitive
level3:
Type: String
Default: level3
Description: the Highest value of tag,e.g.Secret
tagkey:
Type: String
Default: datalevel
Description: the key of the tag,e.g datalabel
s3filepath:
Type: String
Default: mapping.json
Description: keyname of the json file which mapping macie Identifiers with your defined levels
Resources:
S3Bucket:
Type: AWS::S3::Bucket
Properties:
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
AccessControl: BucketOwnerFullControl
LifecycleConfiguration:
Rules:
-
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 3
NoncurrentVersionExpirationInDays: 3
Status: Enabled
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Tags:
-
Key: demo
Value: macie-auto-tag
VersioningConfiguration:
Status: Enabled
EventRule:
Type: 'AWS::Events::Rule'
Properties:
Description: EventRule
EventPattern:
source:
- aws.macie
detail-type:
- Macie Finding
detail:
category:
- CLASSIFICATION
State: ENABLED
Targets:
- Arn:
'Fn::GetAtt':
- LambdaFunction
- Arn
Id: '1'
PermissionForEventsToInvokeLambda:
Type: 'AWS::Lambda::Permission'
Properties:
FunctionName:
Ref: LambdaFunction
Action: 'lambda:InvokeFunction'
Principal: events.amazonaws.com
SourceArn:
'Fn::GetAtt':
- EventRule
- Arn
LambdaFunction:
Type: 'AWS::Lambda::Function'
Properties:
Runtime: python3.9
Role: !GetAtt IAMRole.Arn
Handler: index.lambda_handler
Timeout: 600
Environment:
Variables:
level0: !Ref level0
level1: !Ref level1
level2: !Ref level2
level3: !Ref level3
tagkey: !Ref tagkey
mappingbucket: !Ref S3Bucket
mappingfile: !Ref s3filepath
Code:
ZipFile: |
import json
import boto3
import os
s3 = boto3.client('s3')
typelist=[]
#把环境变量中设置的级别取出 get all the values from enviromental paramter
tagkey=os.environ['tagkey']
mappingbucket=os.environ['mappingbucket']
mappingfile=os.environ['mappingfile']
dataclass=[]
for i in range(4): #if your levels are 3 or 5 ,you have to change this number and also the enviroment parameters
dataclass.append(os.environ[('level'+str(i))])
#判断现有标签的级别是否存在 decide the current tag of object
#read the mapping json file from s3,读取客户自定义类型的匹配json文件
def getmapping(mappingbucket,mappingfile):
s3 = boto3.resource('s3')
obj = s3.Object(mappingbucket, mappingfile)
data = json.load(obj.get()['Body'])
#print(data)
return(data)
datadic=getmapping(mappingbucket,mappingfile)
#print(datadic)
#标签定级分析,需要提前预置对应关系,并对原有标签进行比较,取较高者,choose the highest level tag
def currenttag(filetag):
oldlevel=0
for each in filetag:
if each['Key']==tagkey:
oldvalue=each['Value']
if oldvalue in dataclass:
oldlevel=dataclass.index(oldvalue)
else:
print(oldvalue,' is not defined please check error, will treate it as none')
oldlevel=0
filetag.remove(each)
return(oldlevel,filetag)
def taglevel(newtypelist,oldlevel):
levels=[]
for each in newtypelist:
print('sensitive data type: ',each)
if (each in datadic.keys()):
levels.append(datadic[each])
else:
print(each,' can not find its level data ,please update your mapping json!')
print("found these sensitive level:{}".format(levels))
return(max(levels))
def gettag(s3name,filename):
response = s3.get_object_tagging(
Bucket=s3name,
Key=filename)
return(response["TagSet"])
def tag(s3name,filename,filetag):
response = s3.put_object_tagging(
Bucket=s3name,
Key=filename,
Tagging={
'TagSet': filetag
}
)
return response
def lambda_handler(event, context):
#根据event类型,获得S3相关信息 get S3 information based on event type,macie or securityhub
sourcetype=event["detail-type"]
if sourcetype=="Macie Finding":# macie event ,all lower case letters
accountid = event["detail"]["accountId"]
region = event["detail"]["region"]
s3name = event["detail"]["resourcesAffected"]["s3Bucket"]["name"]
filename = event["detail"]["resourcesAffected"]["s3Object"]["key"]
#filetag = event["detail"]["resourcesAffected"]["s3Object"]["tags"] # drop it because macie fining's tag is lowercase
customrule=event["detail"]["classificationDetails"]["result"]["customDataIdentifiers"]["detections"]
managedrule=event["detail"]["classificationDetails"]["result"]["sensitiveData"]
for i in range(len(customrule)):
typelist.append(customrule[i]["name"])
#managed rule结果获取
for i in range(len(managedrule)):
temp=managedrule[i]["detections"]
for each in temp:
typelist.append(each['type'])
if sourcetype=="Security Hub Findings - Custom Action":# sechub finding use Capital letters
accountid = event["detail"]["findings"][0]["Resources"][0]["Details"]["AwsS3Bucket"]["OwnerAccountId"]
region = event["detail"]["findings"][0]["Resources"][0]["Region"]
#s3 = event["detail"]["findings"]["Resources"][0]["Id"]
#s3name=s3arn.split(':')[5]
s3info= event["detail"]["findings"][0]["ProductFields"]["S3Object.Path"]
s3name=s3info.split('/')[0]
filename=s3info[s3info.find('/')+1:]
customrule=event["detail"]["findings"][0]["Resources"][1]["DataClassification"]["Result"]["CustomDataIdentifiers"]["Detections"]
managedrule=event["detail"]["findings"][0]["Resources"][1]["DataClassification"]["Result"]["SensitiveData"]
for i in range(len(customrule)):
typelist.append(customrule[i]["Name"])
for i in range(len(managedrule)):
temp=managedrule[i]["Detections"]
for each in temp:
typelist.append(each['Type'])
newtypelist=list(set(typelist))
#print("scanned sensitive data types : {}".format(newtypelist))
filetag = gettag(s3name,filename)
#print(filetag)
#return(tag(s3name,filename,filetag))
#判断需要打上的标签级别,原来的与新扫描出的结果,取最高.decide the new level of tag,choose the highest
oldlevel=currenttag(filetag)[0]
filetag=currenttag(filetag)[1]
print('原有标签级别为:'+str(oldlevel)+'\n The Old tag level is:'+str(oldlevel))
valuelevel=taglevel(newtypelist,oldlevel)
print('新标签级别为:'+str(valuelevel)+'\n The New tag level is:'+str(valuelevel))
#print(len(dataclass))
if valuelevel < len(dataclass):
value=dataclass[valuelevel]
if oldlevel==0:
filetag.append({'Key':tagkey,'Value':value})
#为S3中的obj打上对应的标签 tagging object in s3
print("Lambda is Tagging your Object: {0} \n which is in S3 bucket {1} with \n tag level :{2} \n tag name:{3} ".format(filename,s3name,valuelevel,value))
result=tag(s3name,filename,filetag)
return (result)
else:
print("undefined taggning label for scanned type {}, please check your mapping file!".format(typelist))
Description: detect macie finding to auto tag s3 object
TracingConfig:
Mode: Active
IAMRole:
Type: 'AWS::IAM::Role'
Properties:
Description: basic lambda role plus s3 putobjtag
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- 'sts:AssumeRole'
Policies:
- PolicyName: macie-eb-s3-lambda-policy
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 's3:GetObjectTagging'
- 's3:GetObject'
- 's3:PutObjectTagging'
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource:
- '*'
Outputs:
S3BucketName:
Value: !Ref S3Bucket
Description: S3 Bucket Name which to upload mapping.json
本篇作者