What’s new in Cassandra 1.1: live traffic sampling

Cassandra 1.0 introduced an alternate compaction strategy, LeveledCompactionStrategy, in addition to the existing SizeTieredStategy. Deciding which one to choose can be difficult however, since it can be dependent on the behavior of your application and your requirements. You can reason about these factors to make your decision, but having empirical evidence is always reassuring.

Unfortunately, this is an especially pernicious thing to test fully. You can copy the live data from an existing node to an isolated one and then switch the compaction strategy and wait for compaction to quiesce, but this takes a long time, since switching strategies requires re-compacting all of the data to represent the new strategy. This will also only let you perform static analysis -- total data size, duplicate rows, etc. You won't know for instance if one strategy would become totally overrun at some point in a live situation.

Further complicating the matter, there is no good way to replay your exact write scenario. You could reverse-engineer the write rates from timestamps in the data, but this is not easy to do, and you still won't know exactly in what manner the data was written -- one column at a time, or batches of some size. You can infer some of this from your application of course, but this quickly becomes a project of significant magnitude in itself, when all you want to do is find the right compaction strategy for your deployment.

Fortunately, Cassandra 1.1 introduces a new write survey mode, allowing you to sample live traffic to a test node without impacting your existing cluster much -- equivalent to the impact of bootstrapping a new node. To understand how this works, let's first recall a summary of the bootstrap process:

The joining node contacts a seed and then sleeps for RING_DELAY to learn about all other nodes
It either uses the specified token or chooses one and announces the range it will take for RING_DELAY
At this point, existing node(s) responsible for the new range forward incoming writes for that range to the joining node
The joining node then asks existing node(s) responsible for the new range to stream existing data to it
Once completed, the new node builds indexes and bloom filters for the new data
Finally, it announces to the ring that it is now a full member and begins serving reads and participating in the cluster

The first thing to notice in this process is that it's very light on the existing nodes. All they have to do is stream some subset of their data and forward a subset of their incoming writes. This also makes implementing write survey mode very simple; we just don't do the final step of announcing that the new node is a member of the ring. It will continue to receive forwarded writes as long as it is up, and when you're done testing you can simply kill it without any impact to the cluster, since a node that is not a full member is automatically removed after it has been down for a while.

To exploit this new feature to compare compaction strategies, the following steps would be used:

configure the test node as you would any new node in the cluster
add the -Dcassandra.write_survey=true flag to the JVM (conf/cassandra-env.sh is a good place to do this)
start the node
while it is waiting on RING_DELAY, set the compaction strategy via JMX by modifying org.apache.cassandra.db.[keyspace].[column family].CompactionStrategyClass. This needs to be a fully qualified compaction strategy class name.

At this point, you can let the node run for as long as you like and it will continue to receive writes destined for its range as if it were to fully join. When your testing is complete, you can simply kill the node, or if you prefer to go ahead and add it to your cluster, call org.apache.cassandra.db.StorageService.joinRing() via JMX. It's important to note that the compaction strategy is part of the schema, which is why you must change it via JMX so it's only changed on the test node, not the entire ring.

I hope this aids you in choosing your compaction strategy, as well as testing other settings. In the future, we plan to add support for testing compression settings this way as well, since it's a natural fit.