亚马逊AWS官方博客

自动创建定制化 CloudWatch 告警方案 —— AWS控制台部署方式

1.    方案介绍

在 AWS 云原生环境下,各项服务都为用户提供了直观的控制台、以及灵活的 API 接口,以创建对应的监控告警,这对于需要整合监控告警到自身平台的用户非常友好。但由于默认没有一键批量开启某项服务告警的入口,这给希望开箱即用的用户带来了一定的配置成本,常见的场景为:

  • 在默认情况下,AWS 有为众多服务的实例配置 CloudWatch 监控图表,但并无配置告警,客户也不能为这些实例的某项指标一键配置告警;
  • 逐一为单个实例配置告警后,如果实例被删除,对应配置的告警是不会被级联删除的;
  • 新增实例时,也需要为该实例重新配置告警,这在 AutoScaling 等一些场景下,并不能依赖手动创建。

因此本方案旨在实现自动创建告警定制化告警信息,为企业提供便利,将 AWS 云原生的 CloudWatch 告警引流至企业内部告警平台,而实现了以下功能:

  1. 为某项服务的所有集群 / 实例指标统一配置告警。例如为 EC2 配置持续5分钟 CPU 利用率超过90%的告警等,只要求该服务的实例支持CloudWatch监控;
  2. 对告警信息进行定制化,如中文显示、增加紧急程度信息等,并支持推送到企业内部告警平台、即时通信工具(如国内的钉钉、国外的Slack或Chime)、邮件、短信、电话等;
  3. 获取问题实例的相关属性,例如Tag标签分组等,方便企业对不同资源产生的告警进行紧急程度的区分
  4. 创建该服务的实例时,自动创建对应的告警
  5. 删除该服务的实例时,自动级联删除对应的告警配置
  6. 如现在已经为该服务的某个实例配置了对应的告警,此方案会将该实例加入白名单,不额外配置告警,也不会覆盖现有的告警设置,达到良好的补充效果;
  7. 笔者对EBS、RDS、ALB、NLB、ElastiCache、ElasticSearch、EMR等一些AWS主要服务,创建了自动创建告警的Lambda代码模板;
  8. 现已支持使用AWS CLIAWS 控制台两种部署方式,并对告警指标名称等灵活度较高的变量进行了统一收敛。

2.    实现效果

3.    方案整体架构

4.    实现思路

  1. 创建 / 删除 支持CloudWatch云监控的AWS托管服务资源;
  2. 通过CloudWatch的定时任务规则,自动调用Lambda 1获取所有的对应资源列表、已创建CloudWatch监控的资源列表,并进行比对,实现自动创建、级联删除CloudWatch监控告警的功能;
  3. CloudWatch告警被触发后,会将通知发送给SNS 1,传递通知到Lambda 2进行告警信息定制化,完成后经由SNS 2发送给相关团队的成员 / 对接企业内部监控告警平台。

部署方案分为2部分:

  • 定制化告警信息
    • SNS 1 → Lambda 2 → SNS 2,采取从后往前的部署步骤
  • 自动创建监控告警
    • CloudWatch定时任务 → Lambda 1 → CloudWatch alarm,先部署Lambda 1,再设置CloudWatch定时任务

本文以RDS的告警指标Database Connections为例,进行部署指引。另考虑到各位云上开发者可能会使用Windows、Mac等不同平台的终端,因此此方案选择了AWS控制台 / AWS CLI命令行、以及Powershell脚本的方式,尽可能地做到终端平台的无关性。

建议各位读者可先参阅本文下述的部署指引,使用 AWS 控制台 的方式进行第一次部署;而后如需同样地为其它AWS服务一键批量开启CloudWatch告警,可参照《Auto Create Customized CloudWatch Alarms – AWS CLI部署方式》,极大地缩短部署所需时间。

5. 部署指引

5.1 创建定制化告警信息推送端 – SNS 2

SNS 2用于将定制化的告警通知发生给相关的团队

创建SNS主题<To-DBA_team>,选择SNS standard。

订阅SNS主题通知,支持发送Email,或者是通过webHook API机制,发送到HTTP(S) Endpoint。

建议可配置个人邮箱,以确认定制化告警信息的最终实际效果。

5.2 创建定制化告警信息处理脚本 – Lambda 2

Lambda 2用于针对自动创建监控的告警信息 进行定制化

5.2.1 创建执行Lambda的IAM角色

创建IAM角色,可以直接根据下图进行选择:使用案例 – Lambda。

5.2.2 Attach权限策略:

  • Describe / Create / Drop CloudWatch alarms:

arn:aws:iam::aws:policy/CloudWatchFullAccess

  • Invoke Lambda Function:

arn:aws:iam::aws:policy/service-role/AWSLambdaRole

  • Publish SNS messages:

arn:aws:iam::aws:policy/AmazonSNSFullAccess

  • Put Logs in S3:

arn:aws:iam::aws:policy/AWSLambdaExecute

  • Describe / Alter RDS Attributes, Add Tags to RDS:

arn:aws:iam::aws:policy/aws-service-role/AWSApplicationAutoscalingRDSClusterPolicy

角色名称:lambdaExecRole-autoCreateCxCwAlarms_RDS

角色描述:Lambda execution role for Auto create customized CloudWatch alarms for RDS.

5.2.3 创建Lambda layer

安装PyTZ library,用于本地化时区。

anqdian@3c22fb7680e6 autoCreateCxCw % mkdir python
anqdian@3c22fb7680e6 autoCreateCxCw % ls
python
anqdian@3c22fb7680e6 autoCreateCxCw % /usr/bin/pip3 install -t ./python pytz
Collecting pytz
 Using cached pytz-2021.1-py2.py3-none-any.whl (510 kB)
Installing collected packages: pytz
Successfully installed pytz-2021.1

在刚创建的python目录下,创建changeAlarmToLocalTimeZone.py文件,添加以下内容,并进行打包:

import json
import boto3
import datetime
import pytz
import re
import urllib
import pytz
import re

def searchAvailableTimezones(zone):
    for s in pytz.all_timezones:
        if re.search(zone, s, re.IGNORECASE):
            print('Matched Zone: {}'.format(s))

def getAllAvailableTimezones():
    for tz in pytz.all_timezones:
        print (tz)

def changeAlarmToLocalTimeZone(event,timezoneCode,localTimezoneInitial,platform_endpoint):
    tz = pytz.timezone(timezoneCode)
    #exclude the Alarm event from the SNS records
    AlarmEvent = json.loads(event['Records'][0]['Sns']['Message'])

    #extract event data like alarm name, region, state, timestamp
    alarmName=AlarmEvent['AlarmName']
    descriptionexist=0
    if "AlarmDescription" in AlarmEvent:
        description= AlarmEvent['AlarmDescription']
        descriptionexist=1
    reason=AlarmEvent['NewStateReason']
    region=AlarmEvent['Region']
    state=AlarmEvent['NewStateValue']
    previousState=AlarmEvent['OldStateValue']
    timestamp=AlarmEvent['StateChangeTime']
    Subject= event['Records'][0]['Sns']['Subject']
    alarmARN=AlarmEvent['AlarmArn']
    RegionID=alarmARN.split(":")[3]
    AccountID=AlarmEvent['AWSAccountId']

    #get the datapoints substring
    pattern = re.compile('\[(.*?)\]')
    
    #test if pattern match and there is datapoints
    if pattern.search(reason):
        Tempstr = pattern.findall(reason)[0]

        #get in the message all datapoints timestamps and convert to localTimezone using same format
        pattern = re.compile('\(.*?\)')
        m = pattern.finditer(Tempstr)
        for match in m:
            Tempstr=match.group()
            tempStamp = datetime.datetime.strptime(Tempstr, "(%d/%m/%y %H:%M:%S)")
            tempStamp = tempStamp.astimezone(tz)
            tempStamp = tempStamp.strftime('%d/%m/%y %H:%M:%S')
            reason=reason.replace(Tempstr, '('+tempStamp+')')
    

    #convert timestamp to localTimezone time
    timestamp = timestamp.split(".")[0]
    timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S")
    localTimeStamp = timestamp.astimezone(tz)
    localTimeStamp = localTimeStamp.strftime("%A %B, %Y %H:%M:%S")

    #create Custom message and change timestamps

    customMessage='You are receiving this email because your Amazon CloudWatch Alarm "'+alarmName+'" in the '+region+' region has entered the '+state+' state, because "'+reason+'" at "'+localTimeStamp+' '+localTimezoneInitial +'.'
    
    # Add Console link
    customMessage=customMessage+'\n\n View this alarm in the AWS Management Console: \n'+ 'https://'+RegionID+'.console.aws.amazon.com/cloudwatch/home?region='+RegionID+'#s=Alarms&alarm='+urllib.parse.quote(alarmName)
    
    #Add Alarm Name
    customMessage=customMessage+'\n\n Alarm Details:\n- Name:\t\t\t\t\t\t'+alarmName
    
    # Add alarm description if exist
    if (descriptionexist == 1) : customMessage=customMessage+'\n- Description:\t\t\t\t\t'+description
    customMessage=customMessage+'\n- State Change:\t\t\t\t'+previousState+' -> '+state

    # Add alarm reason for changes
    customMessage=customMessage+'\n- Reason for State Change:\t\t'+reason
 
    # Add alarm evaluation timeStamp   
    customMessage=customMessage+'\n- Timestamp:\t\t\t\t\t'+localTimeStamp+' '+localTimezoneInitial

    # Add AccountID    
    customMessage=customMessage+'\n- AWS Account: \t\t\t\t'+AccountID
    
    # Add Alarm ARN
    customMessage=customMessage+'\n- Alarm Arn:\t\t\t\t\t'+alarmARN

    #push message to SNS topic
    response = platform_endpoint.publish(
        Message=customMessage,
        Subject=Subject,
        MessageStructure='string'
    )

anqdian@3c22fb7680e6 autoCreateCxCw % zip -r SNSSubscribtion-pytzLayer.zip ./python/*

创建Lambda layer

名称:customizedAlarms-RDS_DatabaseConnections

描述:Customize CloudWatch alarms for RDS – DatabaseConnections.

Runtime:Python 3.8

上传.zip文件:SNSSubscribtion-pytzLayer.zip

5.2.4 Powershell on Mac

下载Powershell,选择MacOS 10.13+

https://github.com/PowerShell/PowerShell

安装Powershell on Mac,需要在 系统偏好设置 → 安全性与隐私,允许安装Powershell。

安装AWS工具模块、AWS CLI和升级URLlib

https://docs.aws.amazon.com/zh_cn/powershell/latest/userguide/pstools-getting-set-up-linux-mac.html

# Windows: 
# [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# 启动Powershell: 
pwsh
Install-Module -Name AWS.Tools.Installer -Force
Install-Module -Name AWS.Tools.Common
Install-Module -Name AWS.Tools.Lambda,AWS.Tools.SecurityToken
Install-Module AWSPowerShell
Install-Module AWSLambdaPSCore

pip install --upgrade "urllib3==1.26" awscli

5.2.5 部署Lambda 2

准备以下4个文件:Deploy.ps1、index.py、requirements.txt、setup.cfg,将这4个文件放在单独的文件夹《autoCreateCxCw_RDS-Lambda2》。

在Powershell当中运行Deploy.ps1,部署Lambda。

《Deploy.ps1》

Set-DefaultAWSRegion -Region <us-west-2>
Set-Location -Path $PSScriptRoot

$ZipFileName = 'lambda2-autoCreateCxCw.zip'


Write-Host -Object 'Restoring dependencies ...'
pip3 install -r $PSScriptRoot/requirements.txt -t $PSScriptRoot/


Write-Host -Object 'Compressing files ...'
Get-ChildItem -Recurse | ForEach-Object -Process {
  $NewPath = $PSItem.FullName.Substring($PSScriptRoot.Length + 1)
  zip -u "$PSScriptRoot/$ZipFileName" $NewPath
# Windows:
# Compress-Archive -Path $NewPath -Update -DestinationPath "$PSScriptRoot\$ZipFileName"
}


Write-Host -Object 'Deploying Lambda function'
$Function = @{
  FunctionName = 'CustomizeCloudWatchAlarmsNotifications-RDS_DatabaseConnections'
  Runtime = 'python3.8'
  Description = 'Customize CloudWatch alarms notification for RDS - DatabaseConnections. '
  ZipFilename = $ZipFileName
  Handler = 'index.lambda_handler'
  Role = '<arn:aws:iam::532134256174:role/lambdaExecRole-autoCreateCxCwAlarms_RDS>'
  Environment_Variable = @{
    NotificationSNSTopic = '<arn:aws:sns:us-west-2:532134256174:To-DBA_team>'
    TimeZoneCode = 'Asia/Hong_Kong'
    TimezoneInitial = 'UTC+8'
  # CHIME_WEBHOOK = 'https://hooks.chime.aws/incomingwebhooks/3c8fd66f-6e40-4375-9fe8-0ba6a57cb375?token=aWVuczdtTUd8MXxCZC05SmNIZ3RqUFMydXpydllNTUx2em15WU5YZVNrX0ZodWc3THljdFg0'
  }
  MemorySize = 512
  Timeout = 60
  Layer = "<arn:aws:lambda:us-west-2:532134256174:layer:customizedAlarms-RDS_DatabaseConnections:1>"
}

Remove-LMFunction -FunctionName $Function.FunctionName -Force
Publish-LMFunction @Function


Write-Host -Object 'Deployment completed' -ForegroundColor Green

《index.py》

import boto3
import os
from changeAlarmToLocalTimeZone import *

#Get SNS Topic ARN from Environment variables
NotificationSNSTopic = os.environ['NotificationSNSTopic']

#Get timezone corresponding to your localTimezone from Environment variables
timezoneCode = os.environ['TimeZoneCode']

#Get Your local timezone Initials, E.g UTC+2, IST, AEST...etc from Environment variables
localTimezoneInitial=os.environ['TimezoneInitial']

#Get SNS resource using boto3
SNS = boto3.resource('sns')

#Specify the SNS topic to push message to by ARN
platform_endpoint = SNS.PlatformEndpoint(NotificationSNSTopic)

def lambda_handler(event, context):

    #Call Main function
    changeAlarmToLocalTimeZone(event,timezoneCode,localTimezoneInitial,platform_endpoint)
    
    #Print All Available timezones
    #getAllAvailableTimezones()
    
    #search if Timezone/Country exist
    #searchAvailableTimezones('sy')

《requirements.txt》
requests

《setup.cfg》

[install]
prefix=

5.2.6 Lambda 2设置调优

运行时设置:由于在“Powershell on Mac”步骤当中,我们准备的文件名为index.py,且lambda_handler是Lambda的主入口,因此在Lambda中需要确保“运行时设置”中的处理程序为:index.lambda_handler

可以使用以下JSON作为测试事件。由于SNS发出的通知与下述JSON文件不同,因此不适宜用于此Lambda 2与前置SNS 1的连通性测试。

{
    "Records": [
        {
            "EventSource": "aws:sns",
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:lambda:us-west-2:532134256174:function:CustomizeCloudWatchAlarmsNotifications-RDS_DatabaseConnections",
            "Sns": {
                "Type": "Notification",
                "MessageId": "f9f5ed56-3d38-57c8-b4ea-b51588f5f871",
                "TopicArn": "arn:aws:sns:us-west-2:532134256174:customizedAlarmAction-RDS_DatabaseConnections",
                "Subject": "ALARM: \"Test LocalTime\" in China, Asia (Hong Kong)",
                "Message": "{\"AlarmName\":\"RDS_DatabaseConnections\",\"AlarmDescription\":\"Auto-created customized CloudWatch Alarm <RDS_DatabaseConnections>\",\"AWSAccountId\":\"532134256174\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [0.0 (04/12/20 03:56:00)] was greater than or equal to the threshold (0.0) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"2020-12-04T03:57:01.659+0000\",\"Region\":\"US West (Oregon)\",\"AlarmArn\":\"arn:aws:cloudwatch:us-west:532134256174:alarm:RDS_DatabaseConnections LocalTime\",\"OldStateValue\":\"OK\",\"Trigger\":{\"Period\":60,\"EvaluationPeriods\":1,\"ComparisonOperator\":\"GreaterThanOrEqualToThreshold\",\"Threshold\":0.0,\"TreatMissingData\":\"- TreatMissingData: missing\",\"EvaluateLowSampleCountPercentile\":\"\",\"Metrics\":[{\"Expression\":\"FILL(m1, 0)\",\"Id\":\"e1\",\"Label\":\"Expression1\",\"ReturnData\":true},{\"Id\":\"m1\",\"MetricStat\":{\"Metric\":{\"Dimensions\":[{\"value\":\"API\",\"name\":\"Type\"},{\"value\":\"DescribeAlarms\",\"name\":\"Resource\"},{\"value\":\"CloudWatch\",\"name\":\"Service\"},{\"value\":\"None\",\"name\":\"Class\"}],\"MetricName\":\"CallCount\",\"Namespace\":\"AWS/Usage\"},\"Period\":60,\"Stat\":\"Average\"},\"ReturnData\":false}]}}",
                "Timestamp": "2020-12-04T03:57:01.702Z",
                "SignatureVersion": "1",
                "Signature": "WcgVMPrlQsJY3yqbds968tqKPC6KKDWHSjIwEmzKVHZYg6foN9F5sm2Tp5IWPgaM9wMmYg8dpQjkxSm4q9V9iP1PbLp81RgJS2NghdeHNVnyxyzywXFMDztYZpgB2pjzfT101RVGpUwVPntOpBeBq2KAs/NrFX1nS2aTK/OX+gyOxwYZxRftzd+ttHA+PCh0kKlym7nnxaWuO9hgSrnupH2YttuvsdTSAOZ4MGhBON/sMmmlcxzfiFD+jJaqlHFmQ0DncjSe1NNwceOpwNsue6//sMYU1QzV6bO34I343KmQdXYw/KISDz7qH70Odm7nRLN3ExSOhtC/FS0/dXGl4Q==",
                "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-010a507c1833636cd94bdb98bd93083a.pem",
                "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:532134256174:customizedAlarmAction-RDS_DatabaseConnections",
                "MessageAttributes": {}
            }
        }
    ]
}

5.3 定制化告警信息 – SNS 1

SNS 1用于接收告警信息,并转发到Lambda 2对告警通知进行定制化

创建SNS主题<customizedAlarmAction-RDS_DatabaseConnections>,选择SNS standard。

创建SNS主题订阅,订阅的终端节点选择Lambda <CustomizeCloudWatchAlarmsNotifications-RDS_DatabaseConnections>的ARN。

至此,本方案的第一部分《定制化告警信息》业已完成。

5.4 自动创建监控告警 – Lambda 1

Lambda 1 用于为指定的AWS托管服务下所有的实例 自动创建特定的监控告警

5.4.1     部署Lambda 1

准备以下4个文件:Deploy.ps1、index.py、requirements.txt、setup.cfg,将这4个文件放在单独的文件夹《autoCreateCxCw_RDS-Lambda1》。

在Powershell当中运行Deploy.ps1,部署Lambda。

《Deploy.ps1》

Set-DefaultAWSRegion -Region <us-west-2>
Set-Location -Path $PSScriptRoot

$ZipFileName = 'lambda1-autoCreateCxCw.zip'


Write-Host -Object 'Restoring dependencies ...'
pip3 install -r $PSScriptRoot/requirements.txt -t $PSScriptRoot/


Write-Host -Object 'Compressing files ...'
Get-ChildItem -Recurse | ForEach-Object -Process {
  $NewPath = $PSItem.FullName.Substring($PSScriptRoot.Length + 1)
  zip -u "$PSScriptRoot/$ZipFileName" $NewPath
# Windows: 
# Compress-Archive -Path $NewPath -Update -DestinationPath "$PSScriptRoot\$ZipFileName"
}


Write-Host -Object 'Deploying Lambda function'
$Function = @{
  FunctionName = 'AutoCreateCloudWatchAlarms-RDS_DatabaseConnections'
  Runtime = 'python3.8'
  Description = 'Auto create customized CloudWatch alarms for RDS - DatabaseConnections. '
  ZipFilename = $ZipFileName
  Handler = 'index.handler'
  Role = '<arn:aws:iam::532134256174:role/lambdaExecRole-autoCreateCxCwAlarms_RDS>'
  Environment_Variable = @{
    MetricName = 'DatabaseConnections'
    MaxItems = '3'
    SNS_topic_suffix = 'RDS_DatabaseConnections'
    # CHIME_WEBHOOK = 'https://hooks.chime.aws/incomingwebhooks/3c8fd66f-6e40-4375-9fe8-0ba6a57cb375?token=aWVuczdtTUd8MXxCZC05SmNIZ3RqUFMydXpydllNTUx2em15WU5YZVNrX0ZodWc3THljdFg0'
  }
  MemorySize = 512
  Timeout = 60
}

Remove-LMFunction -FunctionName $Function.FunctionName -Force
Publish-LMFunction @Function


Write-Host -Object 'Deployment completed' -ForegroundColor Green

建议先将 最大创建RDS CloudWatch alarms数量的参数 MaxItems 设置为3,作为全面铺开本监控告警方案之前的效果实测。

《index.py》

笔者对RDS、ElasticSearch、ElastiCache、EMR、ELB、EBS等AWS常用服务都进行了适配。例如,需要为RDS实例的CPU利用率创建自动告警,则应完成以下两步:

  1. 可使用《RDS – CPUUtilization》的模板作为Lambda1 – index.py里面的内容、并确认校正当中指定的告警阈值;
  2. 在上述Lambda1 – Deploy.ps1 – Environment_Variable – MetricName环境变量中,指定对应的CloudWatch告警指标名称(MetricName = ‘CPUUtilization’)。

AWS部分常用服务的自动创建告警Lambda代码模板GitHub链接:

RDS – DatabaseConnections:

RDS DatabaseConnections – vProd

RDS Truncate – vProd

RDS – CPUUtilization:

RDS CPUUtilization – vProd

ElasticSearch – JVMMemoryPressure:

ES JVMMemoryPressure – vProd

ElastiCache – DatabaseMemoryUsagePercentage:

EC DatabaseMemoryUsagePercentage – vProd

EMR – HDFSUtilization:

EMR HDFSUtilization – vProd

ALB – HTTPCode_Target_5XX_Count:

ALB HTTPCode_Target_5XX_Count – vProd SourceCode

ALB Truncate – vProd

NLB – ActiveFlowCount:

NLB ActiveFlowCount – vProd SourceCode

EBS – BurstBalance:

EBS BurstBalance – vProd

 

《requirements.txt》
requests

《setup.cfg》

[install]
prefix=

5.4.2     Lambda 1设置调优

  1. 运行时设置:由于在“Powershell on Mac”步骤当中,我们准备的文件名为py,且handler是Lambda的主入口,因此在Lambda中需要确保“运行时设置”中的处理程序为:index.handler

测试Lambda

在Lambda中配置测试事件,选择sns-notification作为事件模板。也可以将以下真实告警通知替换到测试事件模板当中。

{
    "Records": [
        {
          "EventSource": "aws:sns",
          "EventVersion": "1.0",
          "EventSubscriptionArn": "arn:aws:sns:us-west-2:532134256174:chimewebhook:9809d03a-21a0-4aba-8a2f-d2554cdeac34",
          "Sns": {
              "Type": "Notification",
              "MessageId": "c07ee68e-9dfb-5b65-924e-becec206c0f1",
              "TopicArn": "arn:aws:sns:us-west-2:532134256174:chimewebhook",
              "Subject": "ALARM: 'CW-chime' in US West (Oregon)",
              "Message": {
                  "AlarmName":"CW-chime",
                  "AlarmDescription":null,
                  "AWSAccountId":"532134256174",
                  "NewStateValue":"ALARM",
                  "NewStateReason":"Threshold Crossed: 1 out of the last 1 datapoints [1.0 (01/12/20 15:12:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).",
                  "StateChangeTime":"2020-12-01T15:14:05.915+0000",
                  "Region":"US West (Oregon)",
                  "AlarmArn":"arn:aws:cloudwatch:us-west-2:532134256174:alarm:CW-chime",
                  "OldStateValue":"INSUFFICIENT_DATA",
                  "Trigger":{
                      "MetricName":"HTTPCode_ELB_5XX_Count",
                      "Namespace":"AWS/ApplicationELB",
                      "StatisticType":"Statistic",
                      "Statistic":"AVERAGE",
                      "Unit":null,
                      "Dimensions":[{
                          "value":"app/ELB-CW-SearchTest/860a113ea68c543f",
                          "name":"LoadBalancer"
                      }],
                      "Period":60,
                      "EvaluationPeriods":1,
                      "ComparisonOperator":"GreaterThanOrEqualToThreshold",
                      "Threshold":1.0,
                      "TreatMissingData":"- TreatMissingData: missing",
                      "EvaluateLowSampleCountPercentile":""
                  }
              },
              "Timestamp": "2020-12-01T15:14:05.969Z",
              "SignatureVersion": "1",
              "Signature": "jD0bB3UVT7Boy/SEyVkDy0JCNynjkeMBb4WlqG7Vm3+HDatnXDQrBHAayQ8VQmDgyA9pbdESKeJufdhE77R/73dQ+XX27CnsMQore46J+dNTqEeIKwThT8lmZtUWGypu1fPxpVFZl8eKcZhqjN5pK+OC8u+KdglnJGPkFok/UHZLwMe321oVVvxQznEF/zJGRC+tEUd+3aN/IlaNNZHjFduFnOt0WZDvAK42/3jnsfEk2DzpE7hsRd2+eUJfRIZbCnpxdsFZnsh42fzt44mjNXAoqk9TjbddWtaS5ERkS5vuJHTNDSqes1xCJIgpljkaCO4xQbRg/ZH0+4dGyNDSXw==",
              "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-010a507c1833636cd94bdb98bd93083a.pem",
              "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:532134256174:chimewebhook:9809d03a-21a0-4aba-8a2f-d2554cdeac34",
              "MessageAttributes": {}
          }
        }
    ]
}

5.5 自动创建监控告警 – CloudWatch定时任务

创建CloudWatch定时任务,定时调用Lambda 1 创建监控告警

创建CloudWatch事件规则:

事件源:计划

固定频率:1分钟

目标:Lambda函数

名称:AutoCreateCloudWatchAlarms

描述:Scheduler to run Lambda function <AutoCreateCloudWatchAlarms> every 1 min.

请注意,创建该规则后,Lambda函数<AutoCreateCloudWatchAlarms-RDS_DatabaseConnections>会被每分钟执行一次,请确认是否选择启用规则。此外,如需在全面铺开本监控告警方案之前进行效果实测,可参考《自动创建监控告警 – Lambda 1》章节,将<Prepare target RDS list>代码段中,最大创建RDS CloudWatch alarms的数量 调整为2。

完成以上步骤,本文旨在实现的功能业已实现。后期我们可以根据企业需求,直接在Lambda当中,修改index.py的函数代码,让Lambda支持更多的定制化监控告警的自动部署,也更好地定制企业专属的告警方式。

文章最后附上boto3的开发文档,里面有充足的指引与典型样例,以供参考。

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html

本篇作者

佃安祺

亚马逊云科技技术客户经理,负责企业级客户的架构和成本优化、技术支持等工作。在加入亚马逊云科技前就职于HSBC汇丰软件开发等多家跨国企业,拥有8年IT架构、运维以及客户支持经验,目前致力于互联网智能设备制造等行业。