AWS Big Data Blog
Simplify management of database privileges in Amazon Redshift using role-based access control
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises don’t want to have a concentration of superuser privileges amongst a handful of users. Instead, enterprises want to design their overarching security posture based on the specific duties performed via roles and assign these elevated privilege roles to different users. By assigning different privileges to different roles and assigning these roles to different users, enterprises can have more granular control of elevated user access.
In this post, we explore the role-based access control (RBAC) features of Amazon Redshift and how you can use roles to simplify managing privileges required to your end-users. We also cover new system views and functions introduced alongside RBAC.
Overview of RBAC in Amazon Redshift
As a security best practice, it’s recommended to design security by applying the principle of least privileges. In Amazon Redshift, RBAC applies the same principle to users based on their specific work-related role requirements, regardless of the type of database objects involved. This granting of privileges is performed at a role level, without the need to grant permissions for the individual user or user groups. You have four system-defined roles to get started, and can create additional, more granular roles with privileges to run commands that used to require the superuser privilege. With RBAC, you can limit access to certain commands and assign roles to authorized users. And you can assign object-level as well as system-level privileges to roles across Amazon Redshift native objects.
System-defined roles in Amazon Redshift
Amazon Redshift provides four system-defined roles that come with specific privileges. These can’t be altered or customized, but you can create your own roles as required. The system-defined roles use the sys:
prefix, and you can’t use this prefix for the roles you create.
The following table summarizes the roles and their privileges.
Role Name | Description of Privileges |
sys:operator |
Can access catalog or system tables, and analyze, vacuum, or cancel queries. |
sys:dba |
Can create schemas, create tables, drop schemas, drop tables, truncate tables, create or replace stored procedures, drop procedures, create or replace functions, create or replace external functions, create views, and drop views. Additionally, this role inherits all the privileges from the sys:operator role. |
sys:superuser |
Has the same privileges as the Amazon Redshift superuser. |
sys:secadmin |
Can create users, alter users, drop users, create roles, drop roles, and grant roles. This role can have access to user tables only when the privilege is explicitly granted to the role. |
System privileges
Amazon Redshift also adds support for system privileges that can be granted to a role or a user. A system privilege allows admins to grant a limited set of privileges to a user, such as the ability to create and alter users. These system-defined privileges are immutable and can’t be altered, removed, or added to.
Create custom roles for RBAC in Amazon Redshift
To further granularize the system privileges being granted to users to perform specific tasks, you can create custom roles that authorize users to perform those specific tasks within the Amazon Redshift cluster.
RBAC also supports nesting of roles via role hierarchy, and Amazon Redshift propagates privileges with each role authorization. In the following example, granting role R1 to role R2 and then granting role R2 to role R3 authorizes role R3 with all the privileges from the three roles. Therefore, by granting role R3 to a user, the user has all the privileges from roles R1, R2, and R3.
Amazon Redshift doesn’t allow creation of a cyclic role authorization cycle, so role R3 can’t be granted to role R1, as that would be cyclic role authorization.
You can use the Amazon Redshift commands for privileges to create role, grant role, revoke role, and the admin options for the grant and revoke. Only superusers or regular users who have been granted create role privileges can use those commands.
RBAC example use cases
For this post, we use the industry standard TPC-H dataset to demonstrate our example use cases.
We have three different teams in the organization: Sales, Marketing, and Admin. For this example, we have two schemas, sales
and marketing
, in the Amazon Redshift database. Each schema has the following tables: nation
, orders
, part
, partsupp
, supplier
, region
, customer
, and lineitem
.
We have two different database roles, read-only and read/write, for both the Sales team and Marketing team individually. Each role can only perform operations to the objects belonging to the schema to which the role is assigned. For example, a role assigned to the sales
schema can only perform operations based on assigned privileges to the sales
schema, and can’t perform any operation on the marketing
schema.
The read-only role has read-only access to the objects in the respective schema when the privilege is granted to the objects.
The read/write role has read and write (insert, update) access to the objects in the respective schema when the privileges are granted to the objects.
The Sales team has read-only ( role name sales_ro
) and read/write ( role name sales_rw
) privileges.
The Marketing team has similar roles: read-only ( role name marketing_ro
) and read/write ( role name marketing_rw
).
The Admin team has one role (db_admin
), which has privileges to drop or create database roles, truncate tables, and analyze the entire database. The admin role can perform at the database level across both sales
and marketing
schemas.
Set up for the example use cases
To set up for the example use cases, create a database admin role and attach it to a database administrator. A superuser must perform all these steps.
All the queries for this post are run in the Amazon Redshift native Query Editor v2, but can be run just the same in any query editor, such as SQLWorkbench/J.
- Create the admin role (
db_admin
): - Create a database user named
dbadmin
: - Assign a system-defined role named
sys:dba
to thedb_admin
role:
This role has the privileges to create schemas, create tables, drop schemas, drop tables, truncate tables, create or replace stored procedures, drop procedures, create or replace functions, create or replace external functions, create views, drop views, access catalog or system tables, analyze, vacuum, and cancel queries.
- Assign a system-defined role named
sys:secadmin
to thedb_admin
role:
This role has the privileges to create users, alter users, drop users, create roles, drop roles, and grant roles.
- Assign the user
dbadmin
to thedb_admin
role:
From this point forward, we use the dbadmin
user credential for performing any of the following steps when no specific user is mentioned.
- Create the
sales
andmarketing
database schema: - Create all the eight tables (
nation
,orders
,part
,partsupp
,supplier
,region
,customer
,lineitem
) in thesales
andmarketing
schemas.
You can use the DDL available on the GitHub repo to create and populate the tables.
After the tables are created and populated, let’s move to the example use cases.
Example 1: Data read-only task
Sales analysts may want to get the list of suppliers with minimal cost. For this, the sales analyst only needs read-only access to the tables in the sales
schema.
- Let’s create the read-only role (
sales_ro
) in the sales schema: - Create a database user named
salesanalyst
: - Grant the
sales
schema usage and select access to objects of thesales
schema to the read-only role: - Now assign the user to the read-only sales role:
Now the salesanalyst
database user can access the sales
schema in the Amazon Redshift database using the salesanalyst
credentials.
The salesanalyst
user can generate a report of least-expensive suppliers using the following query:
The salesanalyst
user can successfully read data from the region
table of the sales
schema.
In the following example, the salesanalyst
user wants to update the comment for Region key 0 and Region name AFRICA
in the region
table. But the command fails with a permission denied error because they only have select permission on the region
table in the sales
schema.
The salesanalyst
user also wants to access objects from the marketing
schema, but the command fails with a permission denied error.
Example 2: Data read/write task
In this example, the sales engineer who is responsible for building the extract, transform, and load (ETL) pipeline for data processing in the sales
schema is given read and write access to perform their tasks. For these steps, we use the dbadmin
user unless otherwise mentioned.
- Let’s create the read/write role (
sales_rw
) in thesales
schema: - Create a database user named
salesengineer
: - Grant the
sales
schema usage and select access to objects of thesales
schema to the read/write role by assigning the read-only role to it: - Now assign the user
salesengineer
to the read/write sales role:
Now the salesengineer
database user can access the sales
schema in the Amazon Redshift database using the salesengineer
credentials.
The salesengineer
user can successfully read data from the region
table of the sales
schema.
However, they can’t read tables from the marketing
schema because the salesengineer
user doesn’t have permission.
The salesengineer
user then tries to update the region
table in the sales
schema but fails to do so.
- Now, grant additional insert, update, and delete privileges to the read/write role:
The salesengineer
user then retries to update the region
table in the sales
schema and is able to do so successfully.
When they read the data, it shows that the comment was updated for Region key 0 (for AFRICA) in the region table in the sales
schema.
Now salesengineer
wants to analyze the region
table since it was updated. However, they can’t do so, because this user doesn’t have the necessary privileges and isn’t the owner of the region
table in the sales
schema.
Finally, the salesengineer
user wants to vacuum the region
table since it was updated. However, they can’t do so because they don’t have the necessary privileges and aren’t the owner of the region
table.
Example 3: Database administration task
Amazon Redshift automatically sorts data and runs VACUUM DELETE in the background.
Similarly, Amazon Redshift continuously monitors your database and automatically performs analyze operations in the background. In some situations, such as a major one-off data load, the database administrator may want to perform maintenance on objects in the sales
and marketing
schemas immediately. They access the database using dbadmin
credentials to perform these tasks.
The dbadmin
database user can access the Amazon Redshift database using their credentials to perform analyze and vacuum of the region
table in the sales
schema.
Now the dbadmin
database user accesses the Amazon Redshift database to perform analyze and vacuum of the region
table in the marketing
schema.
As part of developing the ETL process, the salesengineer
user needs to truncate the region table in the sales
schema. However, they can’t perform a truncate because they don’t have the necessary privileges, and aren’t the owner of the region
table in the sales
schema.
The dbadmin
database user can access the Amazon Redshift database to provide truncate table privileges to the sales_rw
role.
Now the salesengineer
can perform a truncate on the region table in the sales
schema successfully.
First, they read the data:
Then they perform the truncate:
They read the data again to see the changes:
For the marketing
schema, you must perform similar operations for the marketing analyst and marketing engineer. We include the following scripts for your reference. The dbadmin
user can use the following SQL commands to create the marketing roles and database users, assign privileges to those roles, and attach the users to the roles.
System functions for RBAC in Amazon Redshift
Amazon Redshift has introduced two new functions to provide system information about particular user membership and role membership in additional groups or roles: role_is_member_of
and user_is_member_of
. These functions are available to superusers as well as regular users. Superusers can check all role memberships, whereas regular users can only check membership for roles that they have been granted access to.
role_is_member_of(role_name, granted_role_name)
The role_is_member_of
function returns true if the role is a member of another role. Superusers can check all roles memberships; regular users can only check roles to which they have access. You receive an error if the provided roles don’t exist or the current user doesn’t have access to them. The following two role memberships are checked using the salesengineer
user credentials:
user_is_member_of( user_name, role_or_group_name)
The user_is_member_of
function returns true
if the user is a member of the specified role or group. Superusers can check all user memberships; regular users can only check their own membership. You receive an error if the provided identities don’t exist or the current user doesn’t have access to them. The following user membership is checked using the salesengineer
user credentials, and fails because salesengineer
doesn’t have access to salesanalyst
:
When the same user membership is checked using the superuser
credential, it returns a result:
When salesengineer
checks their own user membership, it returns the correct results:
System views for RBAC in Amazon Redshift
Amazon Redshift has added several new views to be able to view the roles, the assignment of roles to users, the role hierarchy, and the privileges for database objects via roles. These views are available to superusers as well as regular users. Superusers can check all role details, whereas regular users can only check details for roles that they have been granted access to.
For example, you can query svv_user_grants
to view the list of users that are explicitly granted roles in the cluster, or query svv_role_grants
to view a list of roles that are explicitly granted roles in the cluster. For the full list of system views, refer to SVV views.
Conclusion
In this post, we demonstrated how you can use role-based access control to further fortify your security posture by granularizing privileged access across users without needing to centralize superuser privileges in your Amazon Redshift cluster. Try out using database roles for your future Amazon Redshift implementations, and feel free to leave a comment about your experience.
In the future posts, we will show how these roles also integrate tightly with workload management. You can use them when defining WLM queues, and also while implementing single sign-on via identity federation with Microsoft Active Directory or a standards-based identity provider, such as Okta Universal Directory or Azure AD and other SAML-based applications.
About the Authors
Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.
Dipankar Kushari is a Sr. Specialist Solutions Architect, Analytics with AWS.
Harshida Patel is a Specialist Sr. Solutions Architect, Analytics with AWS.
Debu Panda is a Senior Manager, Product Management with AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases and has presented at multiple conferences such as re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).
Huiyuan Wang is a software development engineer of Amazon Redshift. She has been working on MPP databases for over 6 years and has focused on query processing, optimization and metadata security.