What’s new in Cassandra 0.7: expiring columns
Deprecation warning
This post covers the obsolete Cassandra 0.7. Modern Cassandra manipulates expiring columns using CQL.
Original post
Sometimes, data comes with an expiration date, either by its nature or because it's simply intractable to keep all of a rapidly growing dataset indefinitely.
In most databases, the only way to deal with such expiring data is to write a job running periodically to delete what is expired. Unfortunately, this is usually both error-prone and inefficient: not only do you have to issue a high volume of deletions, but you often also have to scan through lots of data to find what is expired.
Fortunately, Cassandra 0.7 has a better solution: expiring columns. Whenever you insert a column, you can specify an optional TTL (time to live) for that column. When you do, the column will expire after the requested amount of time and be deleted auto-magically (though asynchronously -- see below). Importantly, this was designed to be as low-overhead as possible.
An example using the CLI
First, the setup:
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.
Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] create keyspace demo;
[default@unknown] use demo;
[default@demo] create column family test with comparator=UTF8Type and default_validation_class=UTF8Type;
Let's now insert two columns, one standard and one with a 60 seconds' TTL:
[default@demo] set test[row1][col1] = 'val1';
[default@demo] set test[row1][col2] = 'val2' with ttl=60;
[default@demo] get test[row1];
=> (column=col1, value=val1, timestamp=1291980736812000)
=> (column=col2, value=val2, timestamp=1291987837942000, ttl=60)
Returned 2 results.
Fine. Now is a good time for you to go grab a cup of coffee, since we're going to wait 60s for the second column to expire. ... Done? All right, let's look at this row again:
[default@demo] get test[row1];
=> (column=col1, value=val1, timestamp=1291980736812000)
The second column has been deleted as requested.
Now, what if you want to change the TTL of a column? Remember that with Cassandra, the insertion of a column is really an "insertion or update" operation, depending on whether a previous version of the column exists or not.
This still holds for expiring columns, so to update the TTL you have to re-insert the column with a new TTL. Note that it does mean that if you want to update the TTL for a column for which you don't know the value, you have to read the column and insert it back with the new TTL you want to set.
Programatically
In thrift, a new ttl field has been added to the Column structure which now looks like:
struct Column {
1: required binary name,
2: required binary value,
3: required i64 timestamp,
4: optional i32 ttl,
}
You simply set the ttl field to the number of seconds you want the column to last, and insert that. Note that setting the value to 0 or any negative number is equivalent to not setting it: the column will never expire. Whenever you query a column, the ttl field will be set if and only if you had set a strictly positive TTL when the column was inserted.
It is worth noting that the TTL does not depend on the column timestamp, and hence imposes no constraint on the timestamp. However, it is a good idea to use the current time as timestamps because it gives you a way to query when a given column will expire. More precisely, if you set the timestamp at insertion time using microseconds precision, a given column expires at timestamp + (ttl * 1000000).
How it works
If you want to expire data in a database, you don't have much choice: you need a periodic task that somehow finds expired data and removes it. With lots of data, keeping this efficient can be a challenge. Cassandra actually includes a great opportunity for that kind of job: compaction. Compaction already goes through your data periodically, throwing away old versions of your data, so it is really easy and cheap to use it for data expiration.
When you insert an expiring column, the coordinator node computes when the column will expire and stores this information internally as part of the column structure. As long as the column is live, it acts exactly like a standard column. When the column expires, nothing changes immediately except that the column is considered dead and is not returned by queries anymore--no disk space is freed yet.
The first time the expired column is compacted, it is transformed into a tombstone. This transformation frees some disk space: the size of the value of the expired column. From that moment on, the column is a normal tombstone and follows the tombstone rules: it will be totally removed by compaction (including minor ones in most cases since Cassandra 0.6.6) after GCGraceSeconds.
Implementation implications
A few remarks to keep in mind when using expiring columns:
- As explained above, when a column expires, the disk space it uses is not freed immediately. Some space will be freed by the first compaction happening after the expiration, but it will be fully removed only after GCGraceSeconds and by a compaction.
- The expiration has a precision of one second, as calculated on the server. Thus, very small TTL probably does not make much sense. Moreover, the clocks on the servers should be synchronized, otherwise lesser precision could be observed in theory (since the expiration time is computed on the primary host receiving the initial insertion but then interpreted by other hosts of the cluster).
- Compared to standard columns, each expiring column has an additional overhead of 8 bytes in memory and on disk (to record the TTL and expiration time).