通过无服务器架构实现Amazon Neptune图数据库数据分析可视化

Amazon Neptune 图数据库自从 2018 年 5 月 30 日正式推出以来，已经一年有余,现已经扩展到全球12个区域，同时具有高可用性，并提供只读副本、时间点恢复、到 Amazon S3 的持续备份以及跨可用区的复制，最近新增数据库克隆功能，可以快速而经济高效地创建 Neptune 数据库集群的克隆，特别是在不影响生产环境的前提下，首次创建时只需要很少的额外空间。

Amazon Neptune 是一项快速、可靠且完全托管的图形数据库服务，可帮助用户轻松构建和运行使用高度关联数据集的应用程序。Amazon Neptune 的核心是专门构建的高性能图数据库引擎，它进行了优化以存储数十亿个关系并将图查询延迟降低到毫秒级。 Amazon Neptune 支持常见的图模型 Property Graph 和 W3C 的 RDF 及其关联的查询语言 Apache TinkerPop Gremlin 3.4.1和 SPARQL 1.1，Neptune 支持大多数图的应用场景，例如社交网络、推荐引擎、欺诈检测、知识图谱、生命科学以及网络/IT 运营。

目前，官方并未提供前端展现工具或服务，来实现对 Neptune 图数据库数据分析的可视化，下面我们将结合基于浏览器的 VIS.js 动态可视化库，通过Serverless方式，采用 Amazon S3 静态网站托管与 Amazon API Gateway、AWS Lambda服务，来实现对 Neptune 图数据库数据分析的可视化。

(一) Neptune图数据库可视化方案推荐

用户可以利用 AWS APN 合作伙伴的解决方案，来实现对 Amazon Neptune数据分析的可视化，大多数方案目前都已经在AWS Marketplace中以AMI形式提供，AWS 在该领域的合作伙伴有：

Tom Sawyer Software
Metaphactory
Keylines by Cambridge Intelligence

除了上述的商业化解决方案以外，还有 3 个开源解决方案可供参考：

GraphExp open source visualization tool
js javascript library by D3JS.org
js open source library by VISJS.org

用户可以使用这些可视化库在 Amazon Neptune 之上构建自己的应用程序和产品。在本文的实验部分，我们将重点介绍使用 VIS.js 在 Amazon Neptune 中进行数据可视化的前端展现。VIS.js 是一个用于可视化图数据的 Javascript 库，它具有各种组件，如 DataSet，Timeline，Graph2D，Graph3D，Network 等，用各种图形方式来进行图数据库数据分析的可视化。

(二) 无服务器化Neptune图数据库可视化参考架构

如上图所示，主要技术内容说明如下：

在AWS 俄勒冈区域创建 Neptune 图数据库集群与 EC2 实例
登录到 EC2 加载数据到 Neptune 图数据库
创建和配置 Lambda 函数
创建和配置 API Gateway – Proxy API
在AWS北京区域部署基于S3桶的静态网站，远程调用 API Gateway 终端节点

(三) 创建 Neptune 图数据库集群并加载数据

参考文档在VPC内创建一个Neptune集群

https://docs.aws.amazon.com/neptune/latest/userguide/get-started.html

参考文档完成数据加载初始设置（创建S3桶、S3终端节点、IAM Role）

https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html

在 AWS 俄勒冈区域使用社区AMI（ami-0b4ab2ba75e2ef70c）创建t3实例

此AMI已经打包好所有测试的资源，默认用户是ec2-user，请自行设置AWS中国区域与Global区域的AWS CLI配置。

$ aws configure –profile bjs ### AWS北京区域

AWS Access Key ID [None]: AKI***************

AWS Secret Access Key [None]: tr7*********************

Default region name [None]: cn-north-1

Default output format [None]: json

$ aws configure –profile pdx ### AWS俄勒冈区域

AWS Access Key ID [None]: AKI***************

AWS Secret Access Key [None]: 14U*********************

Default region name [None]: us-west-2

Default output format [None]: json

加载S3数据到 Neptune

同步测试数据到你的 S3 桶

$ aws s3 sync /home/ec2-user/sampledata s3://你的S3桶 –profile pdx

修改数据加载脚本

$ cat /home/ec2-user/load-twitter.sh

curl -X POST \

-H ‘Content-Type: application/json’ \

https://你的Neptune集群终端节点:8182/loader -d ‘

{

“source” : “s3://你的S3桶/neptune/csv/twitter/”,

“format” : “csv”,

“iamRoleArn” : “你的 IAM Role ARN”,

“region” : “us-west-2”,

“failOnError” : “FALSE”

}’

加载数据到 Neptune

$ sh /home/ec2-user/load-twitter.sh

{

“status” : “200 OK”,

“payload” : {

“loadId” : “63afaf9e-76aa-4a7a-8485-5a27cef0e97f”

}

查询数据加载状态

$ curl -G ‘https://你的Neptune集群终端节点:8182/loader/63afaf9e-76aa-4a7a-8485-5a27cef0e97f’

{

“status” : “200 OK”,

“payload” : {

“feedCount” : [

{

“LOAD_COMPLETED” : 7

}

“overallStatus” : {

“fullUri” : “s3://你的S3桶/neptune/csv/twitter/”,

“runNumber” : 1,

“retryNumber” : 0,

“status” : “LOAD_COMPLETED”,

“totalTimeSpent” : 18,

“startTime” : 1566271487,

“totalRecords” : 47400,

“totalDuplicates” : 914,

“parsingErrors” : 0,

“datatypeMismatchErrors” : 0,

“insertErrors” : 0

}

(四) 创建并配置 AWS Lambda 函数

创建AWS Lambda函数执行所需要的Role

$ aws iam create-role –path /service-role/ –role-name lambda-vpc-access-role –assume-role-policy-document ‘{

“Version”: “2012-10-17”,

“Statement”: [

{

“Effect”: “Allow”,

“Principal”: {

“Service”: “lambda.amazonaws.com”

“Action”: “sts:AssumeRole”

}

]

}’ –description “VPC Access role for lambda function” –profile pdx

为Role增加权限

$ aws iam attach-role-policy –role-name lambda-vpc-access-role –policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaENIManagementAccess –profile pdx

创建AWS Lambda 函数，把红色字体更换成自己的内容，有关子网组、安全组、环境变量可以在控制台上调整，建议阅读Lambda源码，了解Gremlin查询

$ aws lambda create-function –function-name mygremlin01 \

–role “arn:aws:iam::你的12位数字帐号:role/service-role/lambda-vpc-access-role” \

–runtime nodejs8.10 –handler indexLambda.handler \

–description “Lambda function to make gremlin calls to Amazon Neptune” \

–timeout 120 –memory-size 256 –publish \

–vpc-config SubnetIds=subnet-8afde9ec,subnet-ac487fe4,subnet-9a1adcc0,SecurityGroupIds=sg-4bc9e337 \

–zip-file fileb:///home/ec2-user/amazon-neptune-samples/gremlin/visjs-neptune/lambdapackage.zip \

–environment Variables=”{NEPTUNE_CLUSTER_ENDPOINT=mygdbcluster01.cluster-crqg2j5cykhg.us-west-2.neptune.amazonaws.com,NEPTUNE_PORT=8182}” –profile pdx

(五) 创建并配置 Amazon API Gateway – Proxy API

使用 AWS CLI 中的以下命令创建 Restful API

$ aws apigateway create-rest-api –name lambda-neptune-proxy-api –description “API Proxy for AWS Lambda function in VPC accessing Amazon Neptune” –profile pdx

{

“apiKeySource”: “HEADER”,

“description”: “API Proxy for AWS Lambda function in VPC accessing Amazon Neptune”,

“createdDate”: 1566278703,

“endpointConfiguration”: {

“types”: [

“EDGE”

]

“id”: “m68yv27u24“,

“name”: “lambda-neptune-proxy-api”

}

请注意前面输出中“id”字段的值，并将其用作下面的<rest-api-id>值

$ aws apigateway get-resources –rest-api-id m68yv27u24 –profile pdx

{

“items”: [

{

“path”: “/”,

“id”: “hpbz6o3ytf”

}

]

}

请注意前面输出中“id”字段的值，并将其用作下面的<parent-id>值，下面的命令将在API的根结构下创建一个资源

$ aws apigateway create-resource –rest-api-id m68yv27u24 –parent-id hpbz6o3ytf –path-part {proxy+} –profile pdx

{

“path”: “/{proxy+}”,

“pathPart”: “{proxy+}”,

“id”: “s3iu93“,

“parentId”: “hpbz6o3ytf”

}

请注意输出中“id”字段的值，并在下面的命令中将其用作<resource-id>

$ aws apigateway put-method –rest-api-id m68yv27u24 –resource-id s3iu93 –http-method ANY \

–authorization-type NONE –profile pdx

{

“apiKeyRequired”: false,

“httpMethod”: “ANY”,

“authorizationType”: “NONE”

}

使用从之前命令获得的相应值，创建API方法的集成

$ aws apigateway put-integration –rest-api-id m68yv27u24 \

–resource-id s3iu93 –http-method ANY –type AWS_PROXY \

–integration-http-method POST \

–uri arn:aws:apigateway:us-west-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-west-2:你的12位数字帐号:function:mygremlin01/invocations –profile pdx

{

“passthroughBehavior”: “WHEN_NO_MATCH”,

“timeoutInMillis”: 29000,

“uri”: “arn:aws:apigateway:us-west-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-west-2:你的12位数字帐号:function:mygremlin01/invocations”,

“httpMethod”: “POST”,

“cacheNamespace”: “s3iu93”,

“type”: “AWS_PROXY”,

“cacheKeyParameters”: []

}

使用下面的命令部署API

$ aws apigateway create-deployment –rest-api-id m68yv27u24 –stage-name test –profile pdx

{

“id”: “n0ro9h”,

“createdDate”: 1566279566

}

执行以下命令以向AWS Lambda函数添加API网关订阅/权限

$ aws lambda add-permission –function-name mygremlin01 \

–statement-id myapigw01 –action lambda:* \

–principal apigateway.amazonaws.com \

–source-arn arn:aws:execute-api:us-west-2:你的12位数字帐号:m68yv27u24/*/*/* –profile pdx

{

“Statement”: “{\”Sid\”:\”myapigw01\”,\”Effect\”:\”Allow\”,\”Principal\”:{\”Service\”:\”apigateway.amazonaws.com\”},\”Action\”:\”lambda:*\”,\”Resource\”:\”arn:aws:lambda:us-west-2:你的12位数字帐号:function:mygremlin01\”,\”Condition\”:{\”ArnLike\”:{\”AWS:SourceArn\”:\”arn:aws:execute-api:us-west-2:你的12位数字帐号:m68yv27u24/*/*/*\”}}}”

}

我们现在已经为 AWS Lambda 函数创建了一个API网关代理, 测试如下：

$ curl https://m68yv27u24.execute-api.us-west-2.amazonaws.com/test/neighbours?id=77

(六) 配置 Amazon S3 存储桶以托管静态网站

在 AWS 北京区域创建 S3 存储桶

$ aws s3api create-bucket –bucket myneptune77 –region cn-north-1 –create-bucket-configuration LocationConstraint=cn-north-1 –profile bjs

配置 S3 存储桶静态网站托管功能

$ aws s3api put-bucket-website –bucket myneptune77 –website-configuration ‘{

“IndexDocument”: {

“Suffix”: “index.html”

“ErrorDocument”: {

“Key”: “error.html”

}

}’ –profile bjs

修改html文件第 57 行的终端节点

PROXY_API_URL 为你部署的 API Gateway 终端节点的 URL

$ vi /home/ec2-user/s3webhost/index.html

var PROXY_API_URL = “https://m68yv27u24.execute-api.us-west-2.amazonaws.com/test”;

上传所有静态文件到 S3 存储桶

$ aws s3 sync /home/ec2-user/s3webhost/ s3://myneptune77 –acl public-read –profile bjs

在浏览器中打开如下页面进行测试

https://myneptune77---s3---cn-north-1.amazonaws.com.rproxy.goskope.com.cn/index.html

选中人名，点击Go按钮，出现选中的人名

点击人名，将查询他的好友及所发的Twitter并进行可视化展现：

查询逻辑通过js封装在AWS Lambda中，使用Gremlim语法直接查Neptune库
前端页面每次点击都会通过API Gateway终端节点去调Lambda函数来读库并返回JSON格式数据
前端解析JSON数据，并通过js进行数据可视化展现

点击人名或Twitter信息，都将进行更深层次的数据可视化展现

(七) 总结及参考资源

综上所述，用户可以充分利用AWS的云原生服务，Amazon API Gateway与 AWS Lambda，并结合S3存储桶的静态网站托管功能，即使AWS中国区域还未提供 Neptune 图数据库的托管服务，用户也能在中国境内非常方便的去远程调用 Neptune 图数据库服务，并轻松实现对Neptune图数据库的数据分析可视化展现。

在本次演示中，除了Neptune图数据库与EC2实例的成本之外，其他服务的使用成本均在免费额度之内，可以忽略不计。

[参考资源]

Amazon Neptune 文档中心

https://docs.aws.amazon.com/zh_cn/neptune/latest/userguide/intro.html

Amazon Neptune 开发者资源

https://aws.amazon.com/cn/neptune/developer-resources/

Amazon Neptune 参考架构

https://github.com/aws-samples/aws-dbs-refarch-graph

Amazon Neptune 参考样例

https://github.com/aws-samples/amazon-neptune-samples

Amazon Neptune 数据导入

https://github.com/awslabs/amazon-neptune-tools