AWS News Blog

New – Amazon Managed Apache Cassandra Service (MCS)

Voiced by Polly

Managing databases at scale is never easy. One of the options to store, retrieve, and manage large amounts of structured data, including key-value and tabular formats, is Apache Cassandra. With Cassandra, you can use the expressive Cassandra Query Language (CQL) to build applications quickly.

However, managing large Cassandra clusters can be difficult and takes a lot of time. You need specialized expertise to set up, configure, and maintain the underlying infrastructure, and have a deep understanding of the entire application stack, including the Apache Cassandra open source software. You need to add or remove nodes manually, rebalancing partitions, and doing so while keeping your application available with the required performance. Talking with customers, we found out that they often keep their clusters scaled up for peak load because scaling down is complex. To keep your Cassandra cluster updated, you have to do it node by node. It’s hard to backup and restore a cluster if something goes wrong during an update, and you may end up skipping patches or running an outdated version.

Introducing Amazon Managed Cassandra Service
Today, we are launching in open preview Amazon Managed Apache Cassandra Service (MCS), a scalable, highly available, and managed Apache Cassandra-compatible database service. Amazon MCS is serverless, so you pay for only the resources you use and the service automatically scales tables up and down in response to application traffic. You can build applications that serve thousands of requests per second with virtually unlimited throughput and storage.

With Amazon MCS, you can run your Cassandra workloads on AWS using the same Cassandra application code and developer tools that you use today. Amazon MCS implements the Apache Cassandra version 3.11 CQL API, allowing you to use the code and drivers that you already have in your applications. Updating your application is as easy as changing the endpoint to the one in the Amazon MCS service table.

Amazon MCS provides consistent single-digit-millisecond read and write performance at any scale, so you can build applications with low latency to provide a smooth user experience. You have visibility into how your application is performing using Amazon CloudWatch.

There is no limit on the size of a table or the number of items, and you do not need to provision storage. Data storage is fully managed and highly available. Your table data is replicated automatically three times across multiple AWS Availability Zones for durability.

All customer data is encrypted at rest by default. You can use encryption keys stored in AWS Key Management Service (AWS KMS). Amazon MCS is also integrated with AWS Identity and Access Management (IAM) to help you manage access to your tables and data.

Using Amazon Managed Cassandra Service
You can use Amazon MCS with the console, CQL, or existing Apache 2.0 licensed Cassandra drivers. In the console there is a CQL editor, or you can connect using cqlsh. 

To connect using cqlsh, I need to generate service-specific credentials for an existing IAM user. This is just a command using the AWS Command Line Interface (AWS CLI):

aws iam create-service-specific-credential --user-name danilop --service-name cassandra.amazonaws.com

{
    "ServiceSpecificCredential": {
        "CreateDate": "2019-11-27T14:36:16Z",
        "ServiceName": "cassandra.amazonaws.com",
        "ServiceUserName": “danilop-at-123412341234",
        "ServicePassword": "...",
        "ServiceSpecificCredentialId": "...",
        "UserName": “danilop",
        "Status": "Active"
    }
}

I can also use the IAM console to manage service-specific credentials. For example, I select my IAM user and the Security credentials tab. There, I can generate, delete, make inactive, or reset the password for Amazon MCS credentials.

Amazon MCS only accepts secure connections using TLS.  I download the Amazon root certificate and edit the cqlshrc configuration file to use it. Now, I can connect with:

cqlsh {endpoint} {port} -u {ServiceUserName} -p {ServicePassword} --ssl

First, I create a keyspace. A keyspace contains one or more tables and defines the replication strategy for all the tables it contains. With Amazon MCS the default replication strategy for all keyspaces is the Single-region strategy. It replicates data 3 times across multiple Availability Zones in a single AWS Region.

To create a keyspace I can use the console or CQL. In the Amazon MCS console, I provide the name for the keyspace.

Similarly, I can use CQL to create the bookstore keyspace:

CREATE KEYSPACE IF NOT EXISTS bookstore WITH REPLICATION={'class': 'SingleRegionStrategy'};

Now I create a table. A table is where your data is organized and stored. Again, I can use the console or CQL. From the console, I select the bookstore keyspace and give the table a name.

Below that, I add the columns for my books table. Each row in a table is referenced by a primary key, that can be composed of one or more columns, the values of which determine which partition the data is stored in. In my case the primary key is the ISBN. Optionally, I can add clustering columns, which determine the sort order of records within a partition. I am not using clustering columns for this table.

Alternatively, using CQL, I can create the table with the following commands:

USE bookstore;

CREATE TABLE IF NOT EXISTS books
(isbn text PRIMARY KEY,
title text,
author text,
pages int,
year_of_publication int);

I now use CQL to insert a record in the books table:

INSERT INTO books (isbn, title, author, pages, year_of_publication)
VALUES ('978-0201896831',
'The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd Edition)',
'Donald E. Knuth', 672, 1997);

Let’s run a quick query. In the console, I select the books table and then Query table.

In the CQL Editor, I use the default query and select Run command.

By default, I see the result of the query in table view:

If I prefer, I can see the result in JSON format, similar to what an application using the Cassandra API would see:

To insert more records, I use csqlsh again and upload some data from a local CSV file:

COPY books (isbn, title, author, pages, year_of_publication)
FROM './books.csv' WITH delimiter=',' AND header=TRUE;

Now I look again at the content of the books table:

SELECT * FROM books;

I can select a row using a primary key, or use filtering for additional conditions. For example:

SELECT title FROM books WHERE isbn='978-1942788713';

SELECT title FROM books WHERE author='Scott Page' ALLOW FILTERING;

With Amazon MCS you can use existing Apache Cassandra 2.0–licensed drivers and developer tools. Open-source Cassandra drivers are available for Java, Python, Ruby, .NET, Node.js, PHP, C++, Perl, and Go.

You can learn more in the Amazon MCS documentation.

Available in Open Preview
Amazon Managed Cassandra Service is available today in open preview in US East (N. Virginia), US East (Ohio), Europe (Stockholm), Asia Pacific (Singapore), Asia Pacific (Tokyo).

As we work with the Cassandra API libraries, we are contributing bug fixes to the open source Apache Cassandra project. We are also contributing back improvements such as built-in support for AWS authentication (SigV4), which simplifies managing credentials for customers running Cassandra on Amazon Elastic Compute Cloud (Amazon EC2), since EC2 and IAM can handle distribution and management of credentials using instance roles automatically. We are also announcing the funding of AWS promotional service credits for testing Cassandra-related open-source projects. To learn more about these contributions, visit the Open Source blog.

During the preview, you can use Amazon MCS with on-demand capacity. At general availability, we will also offer the option to use provisioned throughput for more predictable workloads. With on-demand capacity mode, Amazon MCS charges you based on the amount of data your applications read and write from your tables. You do not need to specify how much read and write throughput capacity to provision to your tables because Amazon MCS accommodates your workloads instantly as they scale up or down.

As part of the AWS Free Tier, you can get started with Amazon MCS for free. For the first three months, you are offered a monthly free tier of 30 million write request units, 30 million read request units, and 1 GB of storage. Your free tier starts when you create your first Amazon MCS resource.

Next year we are making it easier to migrate your data to Amazon MCS, adding support to use AWS Database Migration Service (AWS DMS).

Amazon MCS makes it easy to use Cassandra workloads at any scale, providing a simple programming interface to build new applications, or migrate existing ones. I can’t wait to see what are you going to use it for!

Danilo Poccia

Danilo Poccia

Danilo works with startups and companies of any size to support their innovation. In his role as Chief Evangelist (EMEA) at Amazon Web Services, he leverages his experience to help people bring their ideas to life, focusing on serverless architectures and event-driven programming, and on the technical and business impact of machine learning and edge computing. He is the author of AWS Lambda in Action from Manning.