The role of replication
Cassandra delivers high availability for writing through its data replication strategy. Cassandra duplicates data on multiple peer nodes to ensure reliability and fault tolerance. Relational databases, on the other hand, typically structure tables to keep data duplication at a minimum. The relational database server has to do additional work to ensure data integrity across the tables. In Cassandra, maintaining integrity between related tables is not an issue. Cassandra tables are not related. Usually, Cassandra performs better on writes than relational databases.
About the write path¶
When a write occurs, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk, providing configurable durability.
The commit log receives every write made to a Cassandra node, and these durable writes survive permanently even after hardware failure.
The more a table is used, the larger its memtable needs to be. Cassandra can dynamically
allocate the right amount of memory for the memtable or you can manage the amount of memory
being utilized yourself. When memtable contents exceed a
, the memtable data, which includes indexes, is put in a queue to
be flushed to disk. You can configure the length of the queue by changing
memtable_flush_queue_size in the cassandra.yaml. If the data to be flushed exceeds the queue
size, Cassandra blocks writes. The memtable data is flushed to
on disk using sequential I/O. Data in the commit log is purged after its
corresponding data in the memtable is flushed to the SSTable.
Memtables and SSTables are maintained per table. SSTables are immutable, not written to again after the memtable is flushed. Consequently, a row is typically stored across multiple SSTable files.
For each SSTable, Cassandra creates these in-memory structures:
A list of primary keys and the start position of rows in the data file.
A subset of the partition index. By default 1 primary key out of every 128 is sampled.