AWS Web3 Blog

Automate Avalanche node deployment using the AWS CDK: Part 1

Avalanche is an EVM-compatible, layer-1 blockchain network. The protocol is built upon a novel consensus mechanism, paired with subnets with their own virtual machines. Subnets enable the creation of custom, app-specific blockchains for different use cases and allow the Avalanche network to scale infinitely.

At its core, a blockchain is a set of replicated state machines that run the same set of operations in the same order (for example, smart contract execution, asset transfers), where blocks are ordered globally through consensus. At peak, Avalanche network produces over 500 transactions per block with a sub-second finality (see block explorer), which can make node operation resource-intensive. Although Avalanche fully supports on-premises strategies, depending on your organization’s needs regarding locality and cost, a cloud-native environment can be more cost-effective and reliable.

In this post, we address operational challenges using AWS primitives and the Rust SDK, and showcase how to do a single-command deployment of Avalanche node using the AWS Cloud Development Kit (AWS CDK). A future post will cover more advanced topics like private and custom network deployments and Avalanche Subnet installation.

Before we start, we want to highlight some key technologies from AWS that Avalanche utilizes to create a consistent node deployment.

Amazon EBS volume mapper

As a chain makes progress, its historical states grow, utilizing more disk space over time. The full archival node for a chain can take anywhere from a few hours to days to sync the full chain states. Therefore, you need to have static Amazon Elastic Block Store (Amazon EBS) volumes per an Availability Zone, in case the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance goes offline (for example, for hardware maintenance). That way, on its recovery, the new EC2 instance can simply reload the existing data from the volume, without going through another cycle of state sync. aws-volume-provisioner maintains the static mapping between the Availability Zone and newly launched EC2 instances in order to retain the node state.

In case of a single node deployment, we pin the node to one Availability Zone, because the EBS volume is a zone-specific resource. This is particularly useful when the machine is a Spot Instance with more frequent node restarts.

Elastic IP mapper

An Avalanche node is not required to have a fixed IP host, because the peer uptime is tracked based on its node ID by the randomly sampled peers. However, having an IP statically mapped to a node ID makes node monitoring easy and enables more stable anchor-node discovery mechanisms for private network use cases. aws-ip-provisioner maintains the static mapping between the EBS volume and the Elastic IP.

Note that peer discovery via hard IPs is in general an anti-pattern in cloud networking and system management. However, decentralized networks like Avalanche require vendor-neutral mechanisms for the beacon/anchor node discovery. Therefore, Avalanche maintains the well-established list of anchor node IPs in the code base (see the code on GitHub).

Avalanche agent

avalanched is an agent (or daemon) that runs on every remote machine, and creates and installs Avalanche-specific resources (for example, TLS certificate generation, anchor-node discovery, write avalanche node service files). After the basic AWS resources are provisioned with the AWS CDK and AWS CloudFormation, avalanched auto-generates staking TLS certificates and stores them encrypted in the remote storage Amazon Simple Storage Service (Amazon S3), based on user-provided configuration (see the default avalanche-ops configuration). The TLS certificates are used to uniquely identify each node (maps to a node ID), so it’s important to back it up safely. avalanched uses the user-provided AWS Key Management Service (AWS KMS) key to envelope encrypt the keys before uploading them to Amazon S3.

Avalanche telemetry

Each Avalanche node provides metrics that monitor the overall health and performance of the validators. avalanche-telemetry-cloudwatch is an agent that routinely collects such metrics and reports back to a dedicated telemetry recording service (for example, Amazon CloudWatch). Node operators can create alarms that alert the team if any abnormal conditions are found. The agent is written in Rust using the AWS Rust SDK. An Avalanche node exposes its metrics via a Prometheus endpoint, and the agent periodically queries and parses the metrics data based on the regex-based configuration.

Solution overview

The following system diagram illustrates our solution architecture.

We walk you through the following high-level steps to set up this solution:

  1. Clone the GitHub repo.
  2. Create an S3 bucket for backing up the envelope encrypted node certificate.
  3. Create a KMS key for envelope encrypting the node certificate.
  4. Create an EC2 key pair for SSH access to the node.
  5. Create an EC2 instance role for the Avalanche node.
  6. Create a VPC for the Avalanche node.
  7. Create an EC2 auto scaling group for the Avalanche node.

Prerequisites

For this walkthrough, the following are required:

Clone the GitHub repo

Use the following code to clone the GitHub repo:

cd ${HOME}
git clone https://github.com/ava-labs/avalanche-ops.git
cd ./avalanche-ops

Set AWS Region

For example, you can set your default region as follows.

Set the AWS_REGION as follows:

export AWS_REGION=$(aws configure get region)

You can also run aws configure command and set your region as well.

aws configure set default.region us-west-2

Create an S3 bucket for backing up the envelope encrypted node certificate

Use the following code to create your S3 bucket (replace S3_BUCKET_NAME with your own S3 bucket name):

# e.g.,
S3_BUCKET_NAME=avalancheup-aws-test-bucket-with-cdk-<your name>
aws s3api \
create-bucket \
--bucket ${S3_BUCKET_NAME} \
--region ${AWS_REGION} \
--create-bucket-configuration LocationConstraint=${AWS_REGION}

aws s3 ls s3://${S3_BUCKET_NAME}/

 Create a KMS key for envelope encrypting the node certificate

Create your KMS key with the following code (replace KMS_CMK_ARN with your own KMS key):

KMS_CMK_ARN=$(aws kms create-key --query KeyMetadata.Arn --output text)
echo ${KMS_CMK_ARN}

# e.g.,
KMS_CMK_ARN=arn:aws:kms:us-west-2:931867039610:key/c5a7894a-ddd8-4b67-8c98-081d053bc4e9
aws kms describe-key --key-id ${KMS_CMK_ARN} --query KeyMetadata.Arn --output text

# e.g.,
# arn:aws:kms:us-west-2:931867039610:key/c5a7894a-ddd8-4b67-8c98-081d053bc4e9

To delete the key later, use the following code:

# e.g.,
KMS_CMK_ARN=arn:aws:kms:us-west-2:931867039610:key/c5a7894a-ddd8-4b67-8c98-081d053bc4e9

# to delete
aws kms schedule-key-deletion \
--key-id ${KMS_CMK_ARN} \
--pending-window-in-days 7
aws kms describe-key --key-id ${KMS_CMK_ARN}

 Create an EC2 key pair for SSH access to the nodes

Use the following code to create your EC2 key pair (replace EC2_KEY_PAIR_NAME with your own EC2 key pair name):

# e.g.,
EC2_KEY_PAIR_NAME=avalancheup-aws-test-ec2-key-with-cdk
aws ec2 create-key-pair --key-name ${EC2_KEY_PAIR_NAME}
aws ec2 describe-key-pairs --key-name ${EC2_KEY_PAIR_NAME}

# to delete
aws ec2 delete-key-pairs --key-name ${EC2_KEY_PAIR_NAME}

Create an EC2 instance role for the Avalanche node

Set the following parameters:

  • CDK_REGION – The Region to create resources. The command below is using the default AWS region set on your AWS CLI.
  • CDK_ACCOUNT – The AWS account to create resources
  • ID – The unique identifier for the node
  • KMS_CMK_ARN – The KMS customer managed key (CMK) ARN for envelope encryption of your node certificate
  • S3_BUCKET_NAME – The S3 bucket name to back up the node certificate. The S3 bucket names are unique and hence you would need to add your name to the end of the string.

For example:

cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
npm install
cdk bootstrap
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
ID=my-cluster-id \
KMS_CMK_ARN=arn:aws:kms:us-west-2:931867039610:key/c5a7894a-ddd8-4b67-8c98-081d053bc4e9 \
S3_BUCKET_NAME=avalancheup-aws-test-bucket-with-cdk-<your name> \
npx cdk deploy avalancheup-aws-instance-role-stack

See ec2_instance_role.yaml for the CloudFormation template.

 Create a VPC for the Avalanche node

Set the following parameters:

  • CDK_REGION – The Region to create resources
  • CDK_ACCOUNT – The AWS account to create resources
  • ID – The unique identifier for the node

For example:

cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
ID=my-cluster-id \
npx cdk deploy avalancheup-aws-vpc-stack

See vpc.yaml for the CloudFormation template.

 Create an EC2 auto scaling group for the Avalanche node

Set the following parameters:

  • CDK_REGION – The Region to create resources
  • CDK_ACCOUNT – The AWS account to create resources
  • ID – The unique identifier for the node
  • KMS_CMK_ARN – The KMS CMK ARN for envelope encryption of your node certificate
  • S3_BUCKET_NAME – The S3 bucket name to back up the node certificate
  • EC2_KEY_PAIR_NAME – The EC2 key pair name for SSH access
  • AAD_TAG: Authentication of additional authenticated data (AAD) for envelope encryption
  • INSTANCE_PROFILE_ARN – The EC2 instance profile ARN
  • SECURITY_GROUP_ID – The VPC security group
  • PUBLIC_SUBNET_IDS – The public subnet IDs created for the VPC
  • NETWORK_ID – The network IDs: one for mainnet, five for fuji/test net
  • NLB_VPC_ID – The VPC ID, used for setting up a Network Load Balancer

For example:

cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
export PUBLIC_SUBNET_IDS='subnet-0fbebd0c4c9b5b279,subnet-0639ef980e9b04daa,subnet-08342c679e80f033f';
export CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
ID=my-cluster-id \
KMS_CMK_ARN=arn:aws:kms:us-west-2:931867039610:key/c5a7894a-ddd8-4b67-8c98-081d053bc4e9 \
S3_BUCKET_NAME=avalancheup-aws-test-bucket-with-cdk-<your name> \
EC2_KEY_PAIR_NAME=avalancheup-aws-test-ec2-key-with-cdk \
AAD_TAG=my-add-tag \
INSTANCE_PROFILE_ARN=arn:aws:iam::931867039610:instance-profile/my-cluster-id-instance-profile \
SECURITY_GROUP_ID=sg-0563ba3fe3bed012b \
NETWORK_ID=5 \
NLB_VPC_ID=vpc-095b0c9cc6ce9ba55 \
npx cdk deploy avalancheup-aws-asg-stack

See asg_amd64_ubuntu.yaml for the CloudFormation template.

Check the created resources

By default, the solution creates an Elastic IP per node. If NlbEnabled is set to true (found in the CloudFormation template), use the NlbDnsName output from the preceding stack, otherwise use the Elastic IP to check the metrics and RPC endpoints. For instance, http:// + NlbDnsName/Elastic IP + :9650/ext/metrics returns the current metrics of the node as shown in the following screenshot.

Go to CloudWatch Logs to see the logs being published from avalanched.

The following screenshot shows a list of log events for this cluster.

Got to the CloudWatch Metrics page to see the metrics being published from avalanched.

The following screenshot shows an example of graphed metrics.

The default setup creates SSH and HTTP port inbound rules open to the public. As SSH and HTTP ports aren’t used for peer-to-peer communication and only needed for host machine access, we strongly advise limiting the CIDR range to your own IP address as shown below.

 Clean up

To clean up your resources, run cdk destroy for each stack as follows:

Please run the following command to get your account number and substitute in the CDK_ACCOUNT variable shown below for each of the stacks.

CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)
cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
cdk destroy avalancheup-aws-asg-stack
cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
cdk destroy avalancheup-aws-vpc-stack
cd ${HOME}/avalanche-ops/cdk/avalancheup-aws
CDK_REGION=$(aws configure get region)\
CDK_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output=text)\
cdk destroy avalancheup-aws-instance-role-stack

 Using a Rust-based CLI

If you are looking for a Rust-based CLI, check out avalancheup-aws/recipes.

Conclusion

In this post, we showed how to use the AWS CDK to deploy an Avalanche node on AWS. The avalanched agent helps implement Avalanche-specific installation logic that would otherwise be harder to apply in bash scripts. aws-volume-manager retains the existing volumes in case an EC2 instance is deleted. aws-telemetry-cloudwatch collects Avalanche node metrics that can monitor client-perceived latencies and other performance and reliability issues.

To learn more about Avalanche, check out the avalanche-ops recipes, AWS CDK deployment instructions, and official Avalanche documentation.


About the authors

Raj Seshadri is a Senior Partners Solutions Architect with AWS and and a valued member of the Technical Field Community for both containers and blockchain. With an insatiable appetite for exploring blockchain technology, Raj is particularly drawn to Ethereum, Web3, NFTs, and defi. Before joining AWS, Raj acquired significant industry experience with notable companies such as Aqua Security, Red Hat, Dell, and EMC. In his spare time, he plays tennis and enjoys traveling around the world. Follow him on Twitter @texanraj to stay up-to-date on his latest thoughts and insights.

Guyho Lee is a staff software engineer at Ava Labs, working on the consensus protocols and various toolings. Previously, worked at AWS EKS as a senior software engineer, a lead maintainer of etcd.