Company•September 3, 2013

What’s under the hood in Cassandra 2.0

The headlining features in 2.0 are lightweight transactions, CQL enhancements, and triggers. But 2.0 also features a lot of internal optimizations and improvements!

Performance optimization

Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path.
Single-pass compaction roughly doubles compaction speed for large partitions as well as reducing the impact on the JVM heap and GC.
Leveled compaction now performs size-tiered compaction in L0 when it gets behind. This keeps read performance from deteriorating until leveling can catch back up. We've also dramatically increased LCS sstable size.
For applications still using Thrift, the new half-synchronous, half-asynchronous server based on LMAX Disruptor cuts Thrift overhead dramatically.
Faster partition index lookups and cache reads by improving performance of off-heap memory.
Faster reads of compressed data by switching from CRC32 to Adler checksums.
JEMalloc support for off-heap allocation.
Removing partition-level bloom filters improves read performance by eliminating the bloom filter deserialization from each operation and reducing GC churn.

Spring cleaning

Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema. Upgrade through the latest version of 1.2.9 first.
SuperColumns are gone internally, replaced by composite cells. The SuperColumn API is retained and translated transparently to maintain backwards compatibility. (Richard Low has a good writeup of why supercolumns are obsolete.)
The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric, which is performant enough to be monitored regularly besides eliminating the posibility of OOM-ing your node.
The on-heap partition cache has been removed, leaving only the off-heap option.
Vnodes are on by default, and the old token range bisection code for non-vnode clusters is gone. When not using vnodes, specify a token manually or one will be chosen randomly.
Removed emergency memory pressure valve logic. The intent here was to give operators enough breathing room to fix misconfigurations causing heap pressure, but it was never as reliable as we would have liked. And now that the important storage engine metadata has been moved off-heap, memory shortages will be obvious much earlier.

Operational concerns

Java7 is now required!
Leveled compaction level information has been moved into sstable metadata -- each sstable knows what level it's at, so there is no need for a separate manifest. This makes leveled compaction more robust and snapshots simpler.
Kernel page cache skipping has been removed in favor of optional row preheating.
Streaming has been rewritten to be more transparent and robust.
Streaming support for old-version sstables means you no longer have to manually run upgradesstables across the cluster before you can perform repairs. It also means you can bulk load old snapshots directly.

JUMP TO SECTION

Performance optimization

Spring cleaning

Operational concerns

More Company

View All

Company • December 11, 2024

Shaping the Wild in Las Vegas: An AWS re:Invent Recap

Company • December 9, 2024

Announcing 12 Days of Codemas: The DataStax Holiday Giveaway!

Company • September 26, 2024

London Called. RAG++, The AI Event, Answered!

Company • September 23, 2024

DataStax Named a Leader in The Forrester Wave™: Vector Databases, Q3 2024

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

What’s under the hood in Cassandra 2.0

Performance optimization

Spring cleaning

Operational concerns

Share

Share

Performance optimization

Spring cleaning

Operational concerns

More Company

Shaping the Wild in Las Vegas: An AWS re:Invent Recap

Announcing 12 Days of Codemas: The DataStax Holiday Giveaway!

London Called. RAG++, The AI Event, Answered!

DataStax Named a Leader in The Forrester Wave™: Vector Databases, Q3 2024

One-stop Data API for Production GenAI

Subscribe to AI++