What’s New in Cassandra 2.2: JSON Support
One of the several improvements in Cassandra 2.2 is the extension of CQL to make it easier to work with JSON documents. The SELECT and INSERT statements now include a JSON-focused variant, and two new native functions have been added to convert to and from JSON.
JSON != Schemaless
When designing this feature, we wanted to ensure that users would continue to work with data in a type-safe, schema-enforced way. For that reason, working through JSON does not allow you to bypass Cassandra's schema. All data is still validated against the same types, and the schema must be manually defined up-front with a normal CREATE TABLE statement.
INSERT JSON
The INSERT statement now accepts a JSON variant. Suppose we have a table defined like this:
|
Normally we would insert a row like this:
|
The JSON version looks like this:
|
The JSON-encoded map is simply a CQL string literal that is a JSON encoding of a map where keys are column names and values are column values. This means that drivers don't need to do anything special to support INSERT JSON. For example, with the python driver, you could prepare and execute the statement like so:
|
This should work nicely with existing JSON libraries, making it easy to load documents:
|
Type Interpretation
When Cassandra types have a sensible native JSON equivalent, such as ints, floats, booleans, and lists, those native types are accepted. For Cassandra types that don't have a clear JSON equivalent, such as UUIDs, a string representation matching the normal CQL literal format should be used.
For example, in CQL you can represent a UUID with a literal like this:
|
When using INSERT JSON, you should use a string with the same format to represent the UUID:
|
Lists, sets, and tuples can all be represented by JSON lists:
|
User-defined types are represented by JSON maps:
|
Please refer to the CQL documentation for full details on the accepted JSON formats for each Cassandra type.
Omitted Columns
Columns which are omitted from the JSON value map are treated as a null insert (which results in an existing value being deleted, if one is present).
Non-text Map Keys
The JSON specification does not allow for non-text map keys. However, Cassandra's map type does support non-text keys. In order to support non-text keys, Cassandra will accept JSON-encoded string representations of any type as a map key [1] .
For example, suppose we have a table like this:
|
The map keys are ints, so we need to JSON encode them:
|
We do the same thing for more complex key types, such as set<text>:
|
|
Note that the map key is a JSON encoding of the list, not a normal CQL string literal, so double-quotes are used to surround the text items (and need to be escaped).
Case-sensitive Column Names
The INSERT JSON value map uses column names for the top-level keys. As with normal CQL, these column names are case-insensitive. So, for example, if you have a table like this:
|
The following INSERT would work just fine:
|
On the other hand, if your table is declared with case-sensitive column names, you will need to use slightly special column names in your JSON value map. Suppose our table is instead defined like this:
|
The "Age" and "Stage" columns are case-sensitive. In the JSON value map, you must match the capitalization and add an extra set of double-quotes to the column names:
|
SELECT JSON
The SELECT statement has also be extended to support retrieval of rows in a JSON-encoded map format. The results for SELECT JSON will only include a single column named [json]. This column will contain the same JSON-encoded map representation of a row that is used for INSERT JSON. For example, if we have a table like the following:
|
And we execute the following query:
|
The results will look like this in cqlsh:
|
It's also fine to use any normal selection clause. The map keys will match what the result column names would be for an equivalent non-JSON SELECT statement. For example:
|
Will return:
|
The results of SELECT JSON are designed to be usable in an INSERT JSON statement without any modifications, so all of the same rules about non-text map keys and case-sensitive column names apply.
fromJson() and toJson()
INSERT JSON and SELECT JSON are designed to work with entire rows. When you only need to use JSON for a single column, the new toJson() and fromJson() functions can be used. These behave the same as INSERT JSON and SELECT JSON, but are limited to a single value or column.
fromJson()
The fromJson() function converts a single JSON-encoded string to a normal Cassandra value. For example, this can be used when performing an update:
|
The only place where fromJson() cannot be used is the selection clause of SELECT statements. (This is because Cassandra can't know in advance what type the result will be.)
toJson()
The toJson() function is the inverse of fromJson(). It can be used to convert any column to a JSON representation. For example:
|
will return:
|
The toJson() function can only be used in the selection clause of SELECT statements.
Summary
Cassandra 2.2 makes it easier to work with JSON documents without sacrificing the benefits of schema enforcement. Try it out and let us know what you think!
Footnotes
[1]: For the sake of consistency, it will accept string representations of types anywhere, not just in map keys. However, for clarity and performance reasons I don't suggest using this unless you need to.