AWS Web3 Blog
Run Ethereum nodes on AWS
Amazon Managed Blockchain and many partners of AWS offer a convenient way to use Ethereum nodes without operating your own infrastructure. But sometimes, when you want to run archive nodes or participate in Ethereum staking, the managed nodes aren’t enough, and you may choose to run your own Ethereum nodes on AWS.
To run a self-managed node, you need to configure server-side software components called Ethereum clients. After The Merge, each Ethereum node needs to run two clients: the execution layer (EL) client and the consensus layer (CL) client. These clients work in tandem to synchronize the global state with other nodes of the distributed Ethereum database clusters (usually called blockchain networks) such as mainnet, goerli, and sepolia. When you first install and configure both clients, they won’t contain any data and need to catch up with the current state of the blockchain network managed by the rest of the nodes. This process is called the initial sync, and can take multiple days due to the volume of data that needs to be synchronized.
In this post, we share our experience with setting up Ethereum nodes on AWS and the ways to speed up the initial sync to quickly bring up new ones when needed.
Speeding up the initial sync
When you start both clients on your new node for Ethereum mainnet, you need to wait for the CL client to get from the genesis block to The Merge transaction before the EL client can start syncing the blocks. Until then, the EL client either idles or downloads only receipts and block headers. This is time consuming and, in our tests, it could take around 4 days for a CL client like Prysm to sync from the genesis block.
To speed up the process, most CL clients provide checkpoint sync option that makes them only sync from the latest beacon chain checkpoint. The beacon chain introduced the consensus engine (or consensus layer) that replaces proof-of-work mining with proof-of-stake validation. When configuring a checkpoint sync, you have to provide a URL to your trusted checkpoint sync provider. The Ethereum community maintains a list of public endpoints of checkpoint sync providers to choose from. For more information about checkpoint sync, check How To: Checkpoint Sync.
When you use checkpoint sync, the CL client syncs the state of the Ethereum beacon chain from the latest checkpoint (which is a block in the first slot of an epoch) and will be fully operational CL client even without syncing all the previous blocks. The process usually completes within minutes, and then the CL client instructs the paired EL client to start synchronizing the blocks for the EL blockchain. CL clients synced from checkpoint are good enough for validators and to kick-start the EL clients, but the CL clients synced from the genesis block also allow to query chain history and state for archive nodes. If you want to use checkpoint sync but need those functions for additional analytics purposes, you can to use a CL client like Lighthouse that support backfill sync of the previous blocks all the way to the genesis and then, optionally, can reconstruct the state as well.
You can configure EL clients such as Go Ethereum (Geth), Hyperledger Besu, and Nethermind as full nodes to consume less disk space and keep the pruned state only for the most recent 128 blocks. Some EL clients in full node mode also support a quicker sync option called snap sync. It’s about 10 times faster than syncing the state from the Genesis block with full sync. Another node type is archive node, which, in addition to all blocks, builds an archive of historical states to provide richer functionality for historical queries. On the downside, the EL clients configured as archive nodes can’t benefit from snap sync mode and it still takes 5-7 days to sync them. To learn more about node types and sync modes, see Nodes and Clients.
From our experience, by using a combination of the checkpoint sync option in the CL client and the snap sync option in EL client, you should be able to shorten the initial sync time for from the initial 5-7 days to about 1 day for full nodes. If you need the richer functionality of the EL client as archive node, you should still use the checkpoint sync option in your CL client to save yourself multiple days of syncing time.
After you synchronize your first node, you can use the AWS Cloud to scale those nodes horizontally. For better performance we recommend using a separate Amazon Elastic Block Store (Amazon EBS) volume to store blockchain data and, when the initial sync is complete, copy that data to Amazon Simple Storage Service (S3) bucket. Later, when you bring new nodes online, you can copy blockchain data from S3 bucket and speed up the initial sync time for new nodes to less than an hour. In some situations, though, even the initial sync from the recent data copy can take longer because your node may occasionally get stuck with syncing the delta from a peer node with constrained resources like the network speed or an overloaded CPU. In cases like this, you need to monitor your node for slow sync and restart to force it to connect to other peer nodes.
It is also possible to use Amazon EBS Snapshots feature instead of copying data to and from S3, but the nodes initialized that way will experience higher I/O latencies for longer periods of time while their EBS snapshots are loaded from S3. In our experiments copying data from S3 with s5cmd tool takes about 36 minutes per 1 TiB while for EBS initialization without Amazon EBS fast snapshot restore feature it might take many hours.
The approach with keeping your own copy of your client’s blockchain data on AWS works even better for EL clients like Erigon which is operating as archive node and don’t have the snap sync option. It will take several days for such clients to download all necessary data and construct the final state, but after the copy is available, it will take 2 to 3 hours to get the new node up and running.
Solution overview
To implement the ideas discussed earlier, we set up three instances of Amazon Elastic Compute Cloud (EC2) and use the cost-effective AWS Graviton processor with all clients. Each node has two EBS volumes attached to it: one as the root volume and another to keep the blockchain state data. We’ll call one node a “sync node” and dedicate it to synchronizing with the rest of the Ethereum mainnet. The other two nodes are “RPC nodes”, and they provide the RPC API to the user applications (also called decentralized apps or dApps for short). Although functionally the nodes are the same, separating node deployments by roles helps to improve scalability and reliability of the solution.
The RPC nodes are part of the Amazon EC2 Auto Scaling Group (ASG) to quickly provision them from the copy of the sync node’s data. Both of our RPC nodes are behind the Application Load Balancer to manage the load between them. You may also use different Amazon EC2 instance types for the sync node and RPC nodes to achieve better balance between cost and performance. A sync node can be of a smaller instance type, because its sole purpose is to catch up with the chain head. For example, we used AWS Compute Optimizer to assist us in finding the right sizing for the sync node, and found that the r6g.2xlarge EC2 instance type attached by an EBS GP3 volume with 5,700 of provisioned IOPS and 250 Mbps of provisioned throughput is sufficient for a Go Ethereum client with LevelDB as a sync node, but other instance types might work better for other clients. The RPC nodes might require higher specifications for both EC2 instances and EBS volumes and will depend on how often and which APIs are triggered by applications. After initializing from the data copy produced by the sync node, the RPC nodes will keep synchronizing with other nodes in the Ethereum network to serve up-to-date information through their APIs. At the same time, they can be disposed of and replaced by a new instance of an RPC node to scale up, replace faulty nodes, or upgrade client software.
The following diagram illustrates the deployment architecture.
The workflow steps are as follows:
- An ongoing data synchronization process is configured with nodes in the Ethereum network with sync node and RPC nodes.
- The sync node is periodically producing updates to the copy of node’s state data in S3 bucket with s5cmd tool. The key parameters of the blocks syncing progress are sent 5 minutes on automatically created Amazon CloudWatch dashboard for monitoring.
- When a new RPC node is provisioned, it copies the state data from S3 bucket to speed up the initial sync process. It stays in Pending:Wait state of the Auto Scaling Lifecycle during the data copying process and automatically registers with the Application Load Balancer when ready.
- Applications and smart contract development tools can now access highly available RPC nodes behind the Application Load Balancer with restricted access.
If you don’t expect to service a lot of user requests to your nodes, you may instead create only one RPC node in the Auto Scaling group and run your sync node in another Availability Zone as a backup for redundancy.
In the discussions above we use a popular client combination, Go Ethereum execution client and Lighthouse consensus client, but the same setup can be used with other clients too. With the sample AWS Cloud Development Kit (CDK) application in the accompanying GitHub repository you can set up multiple other client combinations. And feel free to share your own configurations with pull requests.
Conclusion
In this post, we discussed Ethereum nodes and clients, the initial sync process used by Ethereum nodes, and how to speed it up using checkpoint sync consensus layer clients and snap sync in some execution layer clients. Then we dived deeper into how to use cloud infrastructure to speed up scaling of Ethereum node instances and shorten the initial sync time even further using a copy of the data in S3. Finally, we discussed the step-by-step process of how the solution works. Now you are ready to try out the sample CDK application and deploy the nodes yourself.
About the Authors
Nikolay Vlasov is a Senior Solutions Architect with the AWS Worldwide Specialist Solutions Architect organization, focused on blockchain-related workloads. He helps clients turn their ideas into pilots, minimally viable products, and production-ready systems based on blockchain technology.
Aldred Halim is a Solutions Architect with the AWS Worldwide Specialist Solutions Architect organization. He works closely with customers in designing architectures and building components to ensure success in running blockchain workloads on AWS.
Writom Guha Roy is a Web3/Fintech Startups Solutions Architect at AWS. In this role, Writom owns technical relationships with Web3/Fintech startups based in Singapore and is their trusted advisor, helping them design their workloads most optimally on AWS Cloud.