TechnologyMarch 17, 2015

Role Based Access Control In Cassandra

Sam Tunnicliffe
Sam Tunnicliffe
Role Based Access Control In Cassandra

Cassandra has some great advantages over mainstream database systems in core areas such as scaling, resilience and performance. However, given its relative youth, it has understandably lagged behind products with decades of development in a few places. One of those is permissions management, where the upcoming 2.2 release will continue to narrow the feature gap by adding significant new functionality to the authentication and authorization subsystem.

Introducing Roles

Cassandra has supported pluggable user and permissions management since its very early versions and this has evolved significantly over time. In 1.2.2 we began including CQL based, internal authenticator and authorizer implementations in the core distribution. As part of a broad reworking of the auth subsystem, Cassandra 2.2 will introduce a number of further enhancements in this area.

One specific improvement is to replace the simplistic approach of managing permissions on an individual user basis with something much more powerful and flexible, through role based access control (RBAC). Under this new scheme, permissions are granted to a role just as they were previously granted to a user, the key difference is that roles can also be granted to each other. So in this context we can think of them as groups, rather than individuals. This greatly simplifies permissions management for administrators by allowing related privileges to be bundled together by granting them to roles, which can in turn then be assigned to specific database users. Some new constructs have been added to the CQL syntax to support this. For example, a simple scenario looks something like this:

CREATE KEYSPACE warehouse WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};
USE warehouse;
CREATE TABLE addresses (
customer_id bigint, 
address_id int, 
address text, 
PRIMARY KEY (customer_id, address_id)
)
CREATE TABLE orders (
customer_id bigint, 
order_id timeuuid, 
product_id uuid, 
product_description text, 
PRIMARY KEY (customer_id, order_id, product_id)
)
CREATE ROLE supervisor;
GRANT MODIFY ON warehouse.orders TO supervisor;
GRANT SELECT ON warehouse.addresses TO supervisor;

So now we have a Role, supervisor, with the necessary permissions to read and write from the two tables. When we have a new database user that we want to be able to act as a supervisor, we just grant them that Role.

CREATE ROLE pam WITH PASSWORD = 'password' AND LOGIN = true;
GRANT supervisor TO pam;

Let's examine those last two statements. The first creates another role, named pam and sets its LOGIN attribute to true. As you might expect, this is what enables a database user to actually identify as this role when logging in via a client such as cqlsh. We also assigned Pam a password as we're using Cassandra's internal password authentication mechanism. There's actually one other attribute we could set when creating a new role. We specify superuser status at the role level, which we would do by adding AND SUPERUSER = true to the CREATE ROLE statement. Finally, note that anything that can be set in CREATE ROLE can be modified later using ALTER ROLE (so we could retrospectively make Pam a superuser if we choose). Pam now is permitted to do all the things a supervisor can do:

LIST ALL PERMISSIONS OF pam;
+------------+------------+-----------------------------+------------+
| role       | username   | resource                    | permission |
+-------------------------+-----------------------------+------------+
| supervisor | supervisor | <table warehouse.addresses> | SELECT     |
| supervisor | supervisor | <table warehouse.orders>    | MODIFY     |
+-------------------------+-----------+-----------------+------------+

(the username column is simply to provide backward compatibility with the results of LIST PERMISSIONS in previous releases).

If we were to add a new table to which supervisors require access, we would simply grant the necessary permissions on it to the supervisor role and Pam, along with all other users assigned the role, would automatically acquire them.

We can go further though, let's create another role and grant it some permissions all of the tables in another keyspace. Then, we'll assign our new role to Pam.

CREATE ROLE office_admin;
GRANT SELECT ON KEYSPACE office TO office_admin;
GRANT MODIFY ON KEYSPACE office TO office_admin;
GRANT office_admin TO pam;

And if we list Pam's permissions, we'll see they represent the aggregate of those granted to her roles.

LIST ALL PERMISSIONS OF pam;
+--------------+--------------+-----------------------------+------------+
| role         | username     | resource                    | permission |
+--------------+--------------+-----------------------------+------------+
| office_admin | office_admin | <keyspace office>           | SELECT     |
| office_admin | office_admin | <keyspace office>           | MODIFY     |
| supervisor   | supervisor   | <table warehouse.addresses> | SELECT     |
| supervisor   | supervisor   | <table warehouse.orders>    | MODIFY     |
+--------------+--------------+-----------------------------+------------+

Likewise, we can ask which roles Pam has been assigned.

LIST ROLES OF pam;
+--------------+-------+-------+---------+
| role         | super | login | options |
+--------------+---------------+---------+
| office_admin | False | False | {}      |
| pam          | False | True  | {}      |
| supervisor   | False | False | {}      |
+--------------+-------+-------+---------+

Inheritance and Hierarchies

As you can see, roles inherit the permissions of any other roles that they are granted. In the example above, the hierarchy of roles is extremely simple, but that need not be the case. It is perfectly possible to construct a much deeper structure, meaning admins can make permissions as fine grained as necessary without incurring a huge administrative burden. One last thing to note regarding inheritance, whilst permissions and superuser status are inherited, the LOGIN attribute is not. In order for database users to identify as a particular role at login, that role must have its LOGIN attribute set to true, this prevents users inadvertently logging in under the identity of a group, like supervisor.

Automatic Granting of Permissions

Another interesting aspect to this is that the creator of a role (the role the database user who issues the CREATE ROLE statement is logged in as), is automatically granted permissions on it. This enables users with role-creation privileges to also manage the roles they create, allowing them to ALTER, DROP, GRANT and REVOKE them. This automatic granting of 'ownership' permissions isn't limited to roles either, it also applies to database objects such as keyspaces, tables (and soon to user defined functions). This largely removes the requirement to have any active superuser roles, which reduces the risk of privilege escalation. See CASSANDRA-7216 and CASSANDRA-8650 for full details.

Under the Hood

At the implementation level, one aspect of this rework is to clarify the responsibilities of the various components. For instance, the methods handling user management have been moved from IAuthenticator to the new IRoleManager interface, leaving IAuthenticator implementations responsible purely for validation of credentials supplied during login. A nice side effect of this is that where an external authentication mechanism is used, we no longer have the requirement to create and manage users/roles directly in Cassandra as well as in the external system. By providing a custom IRoleManager implementation, user management and authentication can be completely delegated.

Of course, these changes to the user management model do require implementers of custom auth providers to make some changes to their code, but these should be fairly limited and straightforward. Check out the changes to PasswordAuthenticator.java and CassandraAuthorizer.java as well as the new CassandraRoleManager.java class in the 2.2 source tree for some pointers.

Upgrading

For systems already using the internal auth implementations, the process for converting existing data during a rolling upgrade is straightforward. As each node is restarted, it will attempt to convert any data in the legacy tables into the new schema. Until enough nodes to satisfy the replication strategy for the system_auth keyspace are upgraded and so have the new schema, this conversion will fail with the failure being reported in the system log. During the upgrade, Cassandra's internal auth classes will continue to use the legacy tables, so clients experience no disruption. Issuing DCL statements during an upgrade is not supported. Once all nodes are upgraded, an operator with superuser privileges should drop the legacy tables system_auth.userssystem_auth.credentials and system_auth.permissions. Doing so will prompt Cassandra to switch over to the new tables without requiring any further intervention.

A successful data conversion will report in system.log like so:

INFO  [OptionalTasks:1] CassandraRoleManager.java:410 - Converting legacy users

INFO  [OptionalTasks:1] CassandraRoleManager.java:420 - Completed conversion of legacy users

INFO  [OptionalTasks:1] CassandraRoleManager.java:425 - Migrating legacy credentials data to new system table

INFO  [OptionalTasks:1] CassandraRoleManager.java:438 - Completed conversion of legacy credentials

INFO  [OptionalTasks:1] CassandraAuthorizer.java:396 - Converting legacy permissions data

INFO  [OptionalTasks:1] CassandraAuthorizer.java:435 - Completed conversion of legacy permissions

While the legacy tables are present a restarted node will re-run the data conversion and report the outcome so that operators can verify that it is safe to drop them.

As I mentioned, this is just a part of a wider reworking of the auth subsystem in Cassandra planned for inclusion in the 2.2 release. You can check out more detail and follow progress in CASSANDRA-8394.

Discover more
Apache Cassandra®
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.