TechnologyFebruary 19, 2016

How to Write a Dtest

How to Write a Dtest

What are Dtests?

Apache Cassandra’s functional test suite, cassandra-dtest, short for “distributed tests”, is an open-source Python project on GitHub where much of the Apache Cassandra test automation effort takes place. Unlike Cassandra’s unit tests, the dtests are end-to-end, black box tests that run against Cassandra clusters via CCM. The Cassandra Cluster Manager, or CCM, is a Python library that runs local C* clusters by hosting multiple JVMs on the same box. Each test’s runtime is anywhere from thirty seconds to several minutes. Many are general purpose functional tests, while others are regression tests for specific tickets from the Apache Cassandra JIRA.

Where are Dtests used?

Continuous integration for dtests runs on a publicly accessible Jenkins server at cassci.datastax.com. As patches are written, contributors can use CassCI to run the C* unit tests and the dtest suite against their new code, as discussed here.

Writing a Dtest

Adding a new dtest is quite simple. You’ll want to choose the appropriate module and/or test suite for your new test, or add one if necessary. Add a new test method to the file you’ve chosen; make sure that “test” is in the method name, or nosetests won’t pick it up. Now is a good time to add your test’s docstring. The docstring should include a description of what your test is trying to verify and how, as well as some doxygen markup. See dtest’s contributing.md for more on the appropriate doxygen annotations to use.

Now that the boilerplate is taken care of, you’re ready to begin writing your test. The first step is to launch a C* cluster, like so:

1

2

cluster = self.cluster

cluster.populate(3).start(wait_for_binary_proto=True)

You can modify the number of nodes in the cluster, the number of datacenters, or any of the cassandra.yaml options.

1

2

3

cluster = self.cluster

cluster.set_configuration_options(values={'hinted_handoff_enabled': False}) # Set a cassandra.yaml option

cluster.populate([2, 2]).start(wait_for_binary_proto=True) # A four node cluster. Two nodes in each of two datacenters

Remember that this is using CCM, so all of these processes are running on your laptop. Thus, it’s best not to launch more than five nodes. Most tests run against three nodes.

To create an object representing a connection to your C* cluster, you’ll want to use one of the following methods from dtest.py:

1

2

3

4

def cql_connection(self, node)

def exclusive_cql_connection(self, node)

def patient_cql_connection(self, node)

def patient_exclusive_cql_connection(self, node)

Use patient_cql_connection, unless you have a specific need for one of the others.

1

2

3

4

5

cluster = self.cluster

cluster.populate(3).start(wait_for_binary_proto=True)

node1, node2, node3 = cluster.nodelist()

 

session = self.patient_cql_connection(node1)

From here out will be the actual testing logic. You can use the Python driver to interact with C*, mostly via CQL, or the ccmlib API to run cassandra-stress, nodetool, or any other tool that ships in the C* source.

1

2

3

4

5

session.execute("CREATE KEYSPACE ks WITH replication = { 'class':'SimpleStrategy', 'replication_factor':1} AND DURABLE_WRITES = true")

session.execute("USE ks")

session.execute("CREATE TABLE t (id int PRIMARY KEY, v int)")

session.execute("INSERT INTO t (id, v) VALUES (1, 2)")

rows = session.execute("SELECT * FROM t")

1

2

3

4

5

node1, node2, node3 = cluster.nodelist()

 

node1.stress(['write', 'n=1M', '-rate', 'threads=10'])

node2.decommission()

node3.repair()

You can use assertions.py and Python unittest’s built-in assertions to assert C*’s correctness.

1

2

3

4

5

6

7

from assertions import assert_one

 

session.execute("CREATE KEYSPACE ks WITH replication = { 'class':'SimpleStrategy', 'replication_factor':1} AND DURABLE_WRITES = true")

session.execute("USE ks")

session.execute("CREATE TABLE t (id int PRIMARY KEY, v int)")

session.execute("INSERT INTO t (id, v) VALUES (1, 2)")

assert_one(session, "SELECT * FROM t", [1, 2])

1

2

3

rows = list(session.execute("SELECT * FROM t"))

self.assertEqual(rows[0], 1)

self.assertEqual(rows[1], 2)

Make sure you only use these, and not the Python assert keyword, as they offer significantly improved debug output on failures.

1

2

3

rows = list(session.execute("SELECT * FROM t"))

assert rows[0] == 1 # Do not do this.

assert rows[1] == 2

There’s no need to check for errors in C* logs, as that is automatically handled for you by dtest’s teardown.

Once you have finished with your test, make sure your new code is compliant with PEP8. See contributing.md for how to do so, along with further style guidelines. Now just open a pull request against the riptano/cassandra-dtest repository, and we’ll be happy to review and merge it.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.