Build multi-tenant architectures on Amazon Neptune

Graph database technology has seen a rise in popularity over the last several years, as more and more customers need to extract value from their highly connected data in an efficient and scalable way while gaining flexibility that relational databases can’t provide. Amazon Neptune is purpose-built to store and navigate relationships. This provides advantages over relational databases for use cases like social networking, recommendation engines, and fraud detection, where you need to create relationships between data and quickly query these relationships. Built on open standards, Neptune enables developers to use three popular open source graph query languages: Apache TinkerPop Gremlin, RDF/SPARQL, and openCypher.

When operating in a multi-tenant software as a service (SaaS) environment, using Neptune introduces a few challenges that require considerations of how to partition, isolate, and deploy tenants inside your solution. In this post, we explore approaches that address operating Neptune in a multi-tenant SaaS environment, as well as the considerations that may influence how and when to apply these strategies depending on your tenant needs.

Overview of graph data models

A graph consists of two primary constructs:

A vertex or a node
An edge or a relationship

In a graph database world, we have two prominent data models:

Labeled property graphs (LPG) – Apache TinkerPop and openCypher are open standard projects for property graphs. In this graph data model, we have a set of vertices connected by edges. Both vertices and edges are required to have labels. Additionally, both edges and vertices have zero or more attributes represented as key-value pairs (for example, property_name = property_value), as illustrated in the following diagram. There is no explicit schema for LPG, which is a key differentiator when it comes to data modeling.

Resource description framework (RDF) – The RDF is a World Wide Web Consortium (W3C) standard created to model metadata. It’s a standard model for data interchange on the web. Information is expressed in sentence-like triples of subject-predicate-object (SPO). These SPO triples are used to represent statements and retrieve insights about semantic data. For more information on RDF syntax, refer to N-Triples. In RDF, these triples can be represented on a graph like the following diagram.

SaaS data partitioning models

One of the challenges for SaaS developers is designing architectural patterns for representing and organizing data in a multi-tenant environment. These multi-tenant storage mechanisms and patterns are typically referred to as data partitioning.

In a multi-tenant SaaS environment, it’s important to distinguish between data partitioning and tenant isolation. These concepts, although related, are not synonymous. Data partitioning refers to the method of storing data for each tenant. However, partitioning alone doesn’t guarantee tenant isolation. Additional measures are necessary to make sure the data of one tenant remains inaccessible to another.

There are three common data partitioning models in multi-tenant SaaS systems: silo, pool, and hybrid. Your choice of any model will depend on factors such as compliance, noisy neighbors, tiering strategy, operational requirements, and tenant isolation needs. Additionally, each AWS database technology typically offers its own unique collection of data partitioning and tenant isolation models. This also applies to Neptune when looking at how tenant graphs can be organized to support the various needs of your solution.

Silo model

Some multi-tenant SaaS environments may require tenants’ data to be deployed on fully separated resources, for example due to noisy neighbor concerns or compliance and regulatory requirements. This is where the silo model is applied. In the silo model, storage of tenant data is fully isolated from any other tenant data. All constructs that are used to represent the tenant’s data are considered physically unique to that client, meaning that each tenant will generally have a distinct storage, monitoring, management, and security footprint.

Cluster per tenant

You can implement a silo model with Neptune by having an individual cluster per tenant, as shown in the following diagram. Each cluster has its individual endpoints, providing distinct access points for efficient data interaction and management. By placing each tenant in its own cluster, you create the well-defined boundary between tenants, assuring customers that their data is successfully isolated from other tenants’ data. This isolation is also appealing for SaaS solutions that have strict regulatory and security constraints because each cluster can be encrypted at rest with customer managed keys using the AWS Key Management Service (KMS). Additionally, with each tenant having their own cluster, you don’t have to worry about noisy neighbors, where one tenant may impose a load that could adversely affect the experience of other tenants.

Although the cluster-per-tenant model has its own advantages, it also introduces management and agility challenges. The distributed nature of this model makes it harder to aggregate and assess the operational health across all tenants and activity of tenants. Deployment also becomes more challenging because the onboarding of a new tenant now requires the provisioning of a separate cluster.

When using a Neptune provisioned cluster per tenant, you must select an instance with some size that approximates the maximum load your tenants demand. This dependence on a server also has a cascading impact on the scaling efficiency and cost of your SaaS environment. Although our goal with SaaS is to always size dynamically based on actual tenant load, a Neptune provisioned cluster requires us to over-provision to account for heavier periods of usage and spikes in loads, thereby increasing the cost per tenant. Additionally, as overall tenants’ usage changes over time, scaling the cluster up or down must be applied separately for each tenant. Thankfully, a new serverless option for Neptune addresses some of the major downfalls of the silo model.

Amazon Neptune Serverless is an on-demand auto scaling configuration for Neptune that is architected to scale your DB cluster as needed to meet increases in processing demand, and then scale down again when the demand decreases. For example, Neptune Serverless can scale down to as little as 1 NCU, which significantly reduces cost during idle periods. By deploying a dedicated Neptune Serverless cluster per tenant, you can isolate your tenants’ data while removing the need to monitor each tenant’s workload and having to manually adjust capacity to meet their needs. Because you’re charged only for the resources that your application actually needs, you can cost-optimize the workload to fit the demands of each tenant’s usage.

Tenant isolation for the silo model

To implement the cluster-per-tenant silo isolation model, you can create AWS Identity and Access Management (IAM) data access policies to control access to tenants’ Neptune clusters. These policies prevent one tenant from accessing another tenant’s data. The IAM policy for each tenant should be attached to an IAM role. The application microservice then uses the IAM role to generate fine-grained temporary credentials using the AssumeRole method of AWS Security Token Service (AWS STS). These credentials, which have access only to the Neptune cluster for that tenant, are used to connect to the tenant’s Neptune cluster.

The following code snippet shows a sample data-based IAM policy used to provide a sample tenant (tenant-1) with read and write query access to their respective Neptune cluster. The “Condition” element makes sure that only the calling entity (the principal), which has assumed tenant-1’s IAM role (tenant-role-1), is allowed to access tenant-1’s Neptune cluster.

{
    "Version": "2012-10-17",
    "Statement": [
           {
                "Effect": "Allow",
                "Action": [
                        "neptune-db:ReadDataViaQuery",
                        "neptune-db:WriteDataViaQuery"
                    ],
                "Resource": "arn:aws:neptune-db:<region>:<account-id>:tenant-1-cluster/*",
                "Condition": {
                        "ArnEquals": {
                                "aws:PrincipalArn": "arn:aws:iam::<account-id>:role/tenant-role-1"
                        }
                }
            }
    ]
}

Pool model

Sometimes it isn’t necessary or feasible to implement the silo model because of cost or operational overhead. You may not have the resources to maintain an individual cluster per tenant, or it may not be necessary to physically separate your tenants’ data and a logical separation is enough to meet their needs and compliance requirements. The pool model allows you to place tenants’ data into a single Neptune cluster where all tenants share a common database, as shown in the following diagram. This model reduces the management overhead and can improve the operational efficiency.

There are different ways to model data with the pool model. Your approach is different if you’re building a labeled property graph or a graph with the RDF.

Pool model for LPG

There are three different approaches to the pool model for labeled property graphs on Neptune.

Partitioning tenant data using the property strategy

LPGs allow users to add properties to nodes and edges. To achieve logical separation, you can add a unique tenant identifier as properties to vertices and edges on the graph. The following example demonstrates this, with TId representing a tenant’s logical ID, and P and R labels representing edges and nodes, respectively.

Within labeled property graphs, there are two ways to manage this. The Gremlin query language offers a traversal library, known as PartitionStrategy, to help manage the partitioning of the data. In practice, this looks like the following code:

strategy1 = new PartitionStrategy(partitionKey: “TId“, writePartition: “1“, readPartitions: [“1“])
strategy2 = new PartitionStrategy(partitionKey: “TId“, writePartition: “2“, readPartitions: [“2“])

In openCypher, these libraries don’t exist. You are responsible for writing and modifying your queries to add the tenant ID as a property on nodes and edges. For example:

CREATE (n:Item {`~id`: 'Item_1', ID: 1, Value: '123456', TId: '1'})
CREATE (n:Item {`~id`: 'Item_1', ID: 2, Value: '123456', TId: '2'})

Here, TId is listed as a property on creation.

Partitioning tenant data using the prefix label strategy

Another way to achieve this would be to append your logical ID to labels on vertices, as shown in the following figure.

When writing data in Gremlin, you can append a tenant ID to any node’s label:

g.addV(‘1-R1’)
g.addV(‘2-R8’)

When querying this graph, you can check for the existence of this prefix on a node:

g.V().hasLabel('1-R1')

In openCypher, you can write data as follows:

CREATE (n:`1-R1` {`~id`: 'Item_1', ID: 1, Value: 'XYZ123456'})

To query this data, use the following code:

MATCH n= (:`1-R1`)
RETURN n

Partitioning tenant data using the multi-label strategy

Finally, you can use a multi-label strategy, which requires you to add an extra label to every vertex on the graph with the tenant’s ID. There are two ways to achieve this.

In Gremlin, you can use a traversal strategy called SubgraphStrategy that will create new vertices with this label and query your graph, filtering on the existence of this label:

g.withStrategies(
new SubgraphStrategy(
vertices=hasLabel("1")))

In openCypher, this class doesn’t exist. This means on creation, you must create two labels per node:

CREATE (n:Item:`1` {`~id`: 'Item_1', ID: 1, Value: '12345'})

When filtering for a subgraph with these labels, you can return nodes that have the customer label you’re looking for or share a relationship with another node that has that label:

MATCH n=(:Item:`1`)
return n

Pool model for the RDF

In the RDF, we have a concept of named graphs, which provides a logical way of separating data. In Neptune, you have a default and user-defined named graphs. You can create as many named graphs as you want; collectively, they are called the RDF dataset. In Neptune, unless a user declared a named graph when writing data, all triples are considered part of the default named graph. The following figure shows an RDF dataset in both graph and tabular format.

There are multiple use cases for named graphs, such as:

Data partition and data isolation
Data provenance
Versioning
Inference

For this post, we focus on a data partitioning use case for the pool model. We recommend creating one user-defined named graph for each tenant.

SPARQL query options using Graph Store Protocol

The following are some sample queries using Graph Store Protocol to query or create a named graph for a tenant:

HTTP GET – Retrieves a specific graph of a tenant:

curl --request GET 'https://your-neptune-endpoint:port/sparql/gsp/?graph=http%3A//www.example.com/named/tenant1'

HTTP PUT – Creates or replaces a specific named graph with a payload provided with the request:

curl --request PUT -H “Content-Type: text/turtle” \
--data-raw “@prefix ex: http://example.com/ . ex:subject ex:predicate ex:object ." \
‘https://your-neptune-endpoint:port/sparql/gsp/?graph=http%3A//www.example.com/named/tenant1'

HTTP POST – Creates a new named graph if one doesn’t exist, or merges with the existing graph:

curl --request POST -H “Content-Type: text/turtle” \
--data-raw “@prefix ex: http://example.com/ . ex:subject ex:predicate ex:object ." \
‘https://your-neptune-endpoint:port/sparql/gsp/?graph=http%3A//www.example.com/named/tenant1'

Tenant isolation for the pool model

For the pool model, logical data isolation within the same Neptune cluster is the goal.

Tenant isolation for LPG

With LPGs, you can take three approaches:

Create a property on vertices and edges with a tenant ID
Use an ID prefix on all vertex labels
Create multiple labels per vertex (with one label being tenant ID)

You can query this graph from your microservice by filtering on graph, property, or label in Gremlin and openCypher or a named graph for SPARQL.

Tenant isolation for the RDF

You can use a user-defined named graph for logical isolation of data with the necessary guardrails in place at the application layer with a mapping between the tenant and user-defined named graphs. The following are some important aspects of the RDF and its query language, SPARQL, that you need to be aware of when you’re designing multi-tenancy for an RDF dataset:

In Neptune, when you query against the default named graph, it retrieves all triples from the dataset, including those from user-defined named graphs.
There are no constraints around connections between nodes of different named graphs in the RDF. For instance, in the preceding diagram, you can have one node from :G1 connected to another node in :G2 through an edge.

For example, if an end-user of a particular tenant submits a query to the API, it should validate the following before it submits the query to the Neptune database:

No parts of your application use the default named graph, because this could expose tenants to each other’s data
The end-user is authorized to access a specified user-defined named graph
UPDATE or DELETE queries should always have a tenant-specific user-defined named graph
Nodes on either side of an edge OR relationship should always belong to a given user-defined name graph

For more information about best practices, refer to SPARQL standards compliance in Amazon Neptune.

Hybrid model

It’s common to encounter SaaS solutions that utilize a blend of silo and pool models. Various factors may influence the decision of when and how to employ both silo and pool models within the same environment.

One such factor is the tiering strategy, where SaaS solutions offer unique experiences to each tier of tenants such as Free, Standard, and Premium. For instance, your Free tier tenant data could be stored within a shared Neptune cluster using a pool model, whereas your Standard and Premium tier tenants could align to a cluster-per-tenant silo model.

Additionally, some SaaS providers have the capability to build their solution on a shared Neptune cluster as their foundation. Subsequently, they can create a separate Neptune cluster for tenants that require siloed storage, often due to compliance and regulatory mandates.

Although this can add a level of complexity to your data access layer and management profile, it can also offer your business a way to tier your offering to represent the best of both worlds.

Conclusion

In this post, we looked at some key strategies to consider when building a multi-tenant solution on Neptune.

There is no single preferred model for the solutions that are outlined in this post. Each variation highlights some of the natural tension that exists in SaaS design. When picking a partitioning strategy, you must balance the simplicity and agility of a fully shared model with the security and variability offered by more isolated models.

To learn more, see Amazon Neptune, Software-as-a-Service (SaaS) on AWS, and the Amazon Neptune User Guide.

About the Authors

Dana Owens is a Startups Solutions Architect for AWS and is passionate about helping developers build on cloud-centered databases. She has multiple years of experience working with AWS customers, and specializes in healthcare and life sciences and the AWS cloud-centered graph database, Amazon Neptune.

Nima Seifi is a Solutions Architect helping early-stage startups develop and deploy their applications on AWS. Prior to AWS, he worked as a DevOps architect within retail and digital commerce industries for over 5 years, following a decade of R&D work in mobile internet technology. His current focus at AWS is on the dynamic areas of SaaS, Web3, and generative AI.

Veeresham Gande is a Sr. Technical Account Manager at AWS. He is passionate about databases, and works with AWS customers to help understand their business and technical needs, align technical solutions, and achieve the greatest value from AWS.

AWS Database Blog