TechnologyNovember 16, 2012

Advanced request tracing in Cassandra 1.2

Advanced request tracing in Cassandra 1.2

Accessing saved trace data

In my first post on request tracing in Cassandra, you may have noticed some extra output from cqlsh like this:

 

Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

 

Cassandra automatically saves trace sessions to the system_traces keyspace for reference. Trace data is expired after 24h; if you want to keep a session result longer than that, you'll need to copy it to a more permanent home.

The system_traces keyspace contains two tables, sessions and events:

 

CREATE TABLE sessions (
  session_id uuid PRIMARY KEY,
  coordinator inet,
  duration int,
  parameters map,
  request text,
  started_at timestamp
);

CREATE TABLE events (
  session_id uuid,
  event_id timeuuid,
  activity text,
  source inet,
  source_elapsed int,
  thread text,
  PRIMARY KEY (session_id, event_id)
);

 

Note that cqlsh's rendering of a traced request omits the request parameters as well as the event thread. The parameters are redundant when dealing with an interactive query from cqlsh, but can be useful when probabilistic tracing is enabled (see below).

Also note that activity is a simple text field. Do not rely on this remaining unchanged in future Cassandra releases; it is likely that this will be changed to some kind of enum to make mechanical trace processing easier.

Probabilistic tracing

Besides tracing requests interactively, a coordinator can also be told to trace a proportion of all requests it handles with nodetool settraceprobability. This is useful if your application observes intermittent query slowness, but you're not sure which queries are responsible.

Be judicious with this: tracing a request will usually requre at least 10 rows to be inserted, so it is far from free. Unless you are under very light load tracing all requests (probability 1.0) will probably overwhelm your system. I recommend starting with a small fraction, e.g. 0.001 and increasing that only if necessary.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.