TechnologySeptember 4, 2013

Streaming in Cassandra 2.0

Streaming in Cassandra 2.0

“Streaming” is a component which handles data (part of SSTable file) exchange among nodes in the cluster.
When you bootstrap a new node, it gets data from existing nodes using streaming. When you run nodetool repair, nodes exchange out-of-sync data using streaming.
If you want to bulk load data from your backup set, sstableloader uses streaming to complete task.

Why we updated the streaming protocol?

As you can see, streaming is one of the core components inside Cassandra. If you have experience in operating Cassandra clusters, you have probably wondered why streaming sometimes gets stuck or is slow, and it’s hard to track down what went wrong.
Over time streaming has been improved, but to make it more reliable, traceable, and faster, we had to re-design it from the ground up.
C* version 2.0 was the best timing for re-designing streaming protocol and API.

What's changed in Streaming 2.0?

Streaming 2.0 is designed to achieve the following goals:

  • Better control
  • Better traceability
  • Better performance

For better control

Up to 1.2, there are two kinds of streaming. Stream Out, which a node sends data to another node and Stream In, which a node receives data from another node, and those are separate stream sessions even though operations like move or repair do both send and receive data.
Stream sessions are also separated from operations, so it was hard to tell what operation those streaming you saw on nodetool netstats belong.

In streaming 2.0, all streaming sessions related to certain operation(bulkload, move, bootstrap, etc.) are associated to the same Stream Plan.
As you can see in improved nodetool netstats output, you are able to see what operation is performing streams in one place.

$ bin/nodetool -h 192.168.1.141 netstats
Mode: NORMAL
Bulk Load fdf4cc70-10e9-11e3-bed0-27ba85b87bf8
    /192.168.1.163
        Receiving 3 files, 28437084 bytes total
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-4-Data.db 9244384/9244384 bytes(100%) received from /192.168.1.163
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-5-Data.db 9249617/9249617 bytes(100%) received from /192.168.1.163
            /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-6-Data.db 5635715/9943083 bytes(56%) received from /192.168.1.163
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              0
Responses                       n/a         0              0

Here, you can see the node is receiving 3 files from /192.168.1.163 for "Bulk Load" and has Stream Plan ID of fdf4cc70-10e9-11e3-bed0-27ba85b87bf8.

For better traceability

As stated above, each Stream Plan has its own Stream Plan ID, and all the events happen during the operation are associated with this ID.
You can search through the log file with this ID to trace what is happening:

INFO [STREAM-INIT-/192.168.1.163:58320] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Received streaming plan for Bulk Load
INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Prepare completed. Receiving 3 files(28437084 bytes)
INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Session with /192.168.1.163 is complete
INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] All sessions completed

Through JMX interface, you can get status of all currently running Stream Plans. Streaming 2.0 also added feature to export stream events through JMX Notification, so it is easy to build streaming monitoring application of your own.

For better performance

In the previous versions, file transfer between two nodes happens as described below:

pre 2.0 messaging

When the file is sent, the sender waits for ACK, then proceed sending next files. The sender disconnects every time it receives ACK.

Streaming 2.0 turns above into pipeline on the same connection, so the sender does not have to wait for ACK to transfer next files.

2.0 messaging

Still we have several points that can be improved in the code for better performance, so at this point performance gain is limited, but I hope we will see more performance gain in the future.

Bonus point

Since the protocol is re-designed, we can add new feature to support streaming of older version of SSTable files. So in the future upgrade of C*, you don't need to perform upgradesstables before streaming occurs.

Further resources

Streaming API is located under the package org.apache.cassandra.streaming, and on the Wiki, you will find design doc about this new Streaming API if you are interested.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.