TechnologyMarch 11, 2013

A Look at Data Auditing in DataStax Enterprise 3.0

Robin Schumacher
Robin Schumacher
A Look at Data Auditing in DataStax Enterprise 3.0

DataStax Enterprise 3.0 and above supports the ability to perform data auditing on a database cluster. Data auditing allows an administrator to understand “who looked at what/when” and “who changed what/when”. It basically enables the logging of some or all the user activity that occurs on a database.

Many businesses and organizations today have either external mandates or internal security policies that require the auditing of user actions on a database, so having a built-in way to accomplish this with DataStax Enterprise is helpful to administrators in such environments.

Let’s take a brief look at how to set up and use data auditing in DataStax Enterprise.

Getting Started

Auditing is implemented in DataStax Enterprise via the log4J mechanism that’s built into the product. This allows for the most efficient way of auditing large amounts of activity on a cluster. It also provides a good deal of flexibility to the administrator over what is audited, where the data is written, and how it is presented.

Data auditing is disabled by default in DataStax Enterprise, which means no auditing occurs. To enable auditing, an administrator edits the log4j-server.properties file that is found in the DataStax Enterprise install/resources/cassandra/conf/ directory.

In the log4j-server.properties file, there is a section that is commented out, which is designated for data auditing. The general parameters for log based auditing are the following:

# audit log
#log4j.logger.DataAudit=INFO, A
#log4j.additivity.DataAudit=false
#log4j.appender.A=org.apache.log4j.RollingFileAppender
#log4j.appender.A.File=/var/log/cassandra/audit.log
#log4j.appender.A.bufferedIO=true
#log4j.appender.A.maxFileSize=200MB
#log4j.appender.A.maxBackupIndex=5
#log4j.appender.A.layout=org.apache.log4j.PatternLayout
#log4j.appender.A.layout.ConversionPattern=%m%n
#log4j.appender.A.filter.1=com.datastax.bdp.cassandra.audit.AuditLogFilter
#log4j.appender.A.filter.1.ActiveCategories=ALL
#log4j.appender.A.filter.1.ExemptKeyspaces=do_not_log

The first five log4j lines in the file control the enablement of auditing in the following fashion:

  • The first line (right after the #audit log line) defines the log4j logger and gives it a name
  • The second ensures the root log4j appender is not used
  • The third pushes the auditing data through a rolling file appender
  • The fourth line sets the location of the audit file
  • The fifth line enables the best throughput using buffered IO, but it means that not every audit event might be seen when opening the audit file. However, the plugin ensures that all buffered events are flushed when the server is shutdown

These lines then need to be uncommented to set up the audit log file:

log4j.appender.A.layout=org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern=%m%n
log4j.appender.A.filter.1=com.datastax.bdp.cassandra.audit.AuditLogFilter

Finally, the other parameter that mandates attention is the ActiveCategories setting.  This controls the granularity of auditing that an administrator desires. Everything may be audited or only certain types of actions, with the following options being:

  • QUERY – all SELECT operations
  • DML – all INSERT, UPDATE, and DELETE actions
  • DDL – all object creations, modifications, and drops
  • DCL – all user creations, drops, list users and security privilege grants/revokes
  • AUTH – login events
  • ADMIN – audits various admin commands such as describe schema versions, cluster name, version, ring, etc.
  • ALL – audits DDL, DML, queries and errors

The parameter’s default is ALL.

If an administrator wants to selectively exclude certain Cassandra keyspaces from being audited, they can list them in the ExemptKeyspaces parameter.

Auditing User Activity

Once the log4j audit section has been properly configured, the server needs to be restarted for the changes to take effect. After that, whatever activity occurs on that node in the cluster will be audited.

For example, with the ALL option set for auditing, the following activity:

$ ./cqlsh -3 localhost -u cassandra -p cassandra
Connected to Test Cluster at localhost:9160.

[cqlsh 2.2.0 | Cassandra 1.1.8.1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.33.0]

Use HELP for help.

cqlsh> use dev;

cqlsh:dev> select * from emp;
empid | first_name | last_name  | ssn
------+------------+------------+-----------
1     |      robin | schumacher | 123456789
2     |      laura |       jung | 213456789
cqlsh:dev> insert into emp (empid, first_name, last_name, ssn)
... values (3, 'john','smith',213657898);
cqlsh:dev> delete from emp where empid=3;

Produces this audit output:

$ more audit.log
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737912619|category:AUTH|type:LOGIN|operation:Successful login for user - cassandra
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737912623|category:ADMIN|type:DESC_VERSION
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737912642|category:DML|type:SET_KS|ks:system|operation:USE system;
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737912650|category:ADMIN|type:DESC_CLUSTER_NAME
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737912654|category:QUERY|type:CQL_SELECT|ks:system|cf:Versions|operation:select component, version from system."Versions"
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737928768|category:DML|type:SET_KS|ks:dev|operation:use dev;
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359737933896|category:QUERY|type:CQL_SELECT|ks:dev|cf:emp|operation:select * from emp;
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359738484084|category:DML|type:CQL_UPDATE|ks:dev|cf:emp|operation:insert into emp (empid, first_name, last_name, ssn)
values (3, 'john','smith',213657898);
host:robinsmac.local/192.168.24.2|source:/127.0.0.1|user:cassandra|timestamp:1359738490932|category:DML|type:CQL_DELETE|ks:dev|cf:emp|operation:delete from emp where empid=3;

Writing Audit Data to Tables / Column Families

If an administrator chooses, they can write audit data directly to a table / column family, although the overhead is a little more than using only log files. The same config files is used to direct audit activity to a particular keyspace and table vs. a system file.

Note that if the audit data is deemed sensitive, DSE’s built-in encryption can be used to encrypt the table being used to hold the audit information.

Auditing Hadoop and Solr Activities

Auditing Hadoop and Solr user activities is possible in DataStax Enterprise. For Hadoop, all activities that occur against the Cassandra column families used to house Hadoop data can be audited.

For Solr, in addition to the audit setup described earlier, an admin will need to edit the installation directory /resources/solr/web/solr/WEB-INF/web.xml file and uncomment the filter-mapping section contained in the file:

<!--  To enable audit logging of Solr HTTP requests, enable this filter-mapping 
      which includes the audit logging filter in the filter chain. The audit logging config in DSE's log4j configuration file must also be enabled.    -->
<!--
<filter-mapping>
<filter-name>DseAuditLoggingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
-->

Other Considerations

A few other things to keep in mind when using data auditing in DataStax Enterprise:

  • All nodes must have auditing turned on so that the total activity for a cluster may be collected.
  • The overhead of auditing is dependent on a number of factors including the intensity of concurrent user activity, the level of auditing used, etc.

Next Steps

To give data auditing and all the other features of DataStax Enterprise a try, download a copy today. DataStax Enterprise is completely free to use in development environments with no restrictions, however production deployments do require that a subscription be purchased.

For more information on all of 3.0's security features, please see our online documentation and our “What’s New in DataStax 3.0?” white paper.

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.