CompanyMay 10, 2022

Cassandra Myth Busters: How Hard Is It to Run Cassandra on Kubernetes?

Cassandra Myth Busters: How Hard Is It to Run Cassandra on Kubernetes?

It’s said to be difficult to run stateful workloads like Cassandra on Kubernetes. In this blog, we’ll take a fresh look at that claim.

In our first “myth buster” blog, we took an objective look at the difficulties said to be associated with Apache Cassandra® and evaluated them in the context of Cassandra’s abilities and tools developed to now remove complexity. Cassandra supports cloud-native workloads, and the flexible API layer Stargate.io makes it easier to work with.

Now, in this post, we’ll examine how difficult it is to run stateful workloads like Cassandra on Kubernetes.

Myth: Running stateful workloads like Cassandra on Kubernetes is difficult

Stateless application life is free and easy. Stateful application life not so much. There are different requirements. Storage needs to be persistent and follow a workload if it’s rescheduled on another node. Identification needs to follow you as you go.

The core of the challenge for stateful application developers is the nature of Kubernetes pods and the containers within them. They are ephemeral and therefore so is the data held within them. So, what are you to do?

Reality: That was then, this is now

Kubernetes and Cassandra work hand-in-hand to create a platform for a new generation of modern cloud-native applications. It includes those with stateful requirements. Kubernetes has played a pivotal role in the emergence of modern, cloud-native microservice-oriented applications. 

But there was a catch: Kubernetes was built on stateless applications and ephemeral application workloads. And databases simply didn’t belong in that environment.

Today, it’s a different story: Kubernetes powers a new generation of stateful, data-aware applications. They have improved scalability, reliability, and ease of management. This has been the culmination of several important innovations that arrived in concert-like orchestration.

  • Changes to the Kubernetes ecosystem
  • The creation of the Kubernetes operator for Cassandra
  • K8ssandra

Changes to the Kubernetes ecosystem

Various native Kubernetes resources provide the basic building blocks needed to host stateful applications within Kubernetes. These are APIs like StatefulSets, PersistentVolumes, PersistentVolumeClaims, and StorageClasses.

The Container Storage Interface (CSI) has made it increasingly possible for third-party storage providers to bring new storage systems to the community. These offer flexibility given the varying storage requirements of many different types of applications.

Kubernetes operator for Cassandra

The creation of the Kubernetes operator for Cassandra provided a native Kubernetes experience for deployment and management of Cassandra datacenters within a Kubernetes cluster. The Cass operator leverages all emerging Kubernetes resources to build flexible and robust deployment of Cassandra. This integrates and standardizes all capabilities required to deliver a cohesive Cassandra experience on Kubernetes.

Cass operator provides a Kubernetes custom resource called the CassandraDatacenter. This provides the abstraction layer between configuration provided via Kubernetes and translates it into the configuration of the Cassandra deployment it manages. It also exposes the state of the deployment that can be inspected like other native Kubernetes resources.

K8ssandra eases use of Cassandra on Kubernetes

The latest iteration in this process is K8ssandra. It’s a complete data platform built on Kubernetes and Cassandra with the capabilities of Cass Operator.

K8ssandra elevates and abstracts away its component technologies and integrates essential supporting services. These are for instance repair, backup and restore, monitoring, and data gateway APIs. K8ssandra is open-source, works with the latest Cassandra releases, and continuously evolves to meet production needs of the community.

K8ssandra components on kubernetes

Figure 1: K8ssandra is DataStax-contributed open source project that enables you to run 

Cassandra on Kubernetes with the tools you’ll need for production deployments.

Summary

As Kubernetes has evolved, so too has the technology surrounding Cassandra. With the development of the Cass Operator and K8ssandra, Kubernetes and Cassandra now provide a platform for modern cloud-native applications, including those that have stateful requirements.

These innovations solve some very challenging technology problems. They are proof of the commitment and talent of open-source contributors (including DataStax engineers) that made them possible. As a result, we can now claim Cassandra as the default data tier for building and running powerful, resilient, truly cloud-native data apps on Kubernetes.

Follow the DataStax Tech Blog for more developer stories. Check out the DataStax YouTube channel for tutorials and DataStax Developers on Twitter for the latest news about our developer community.

Resources

  1. Apache Cassandra
  2. DataStax Astra DB (Cassandra-as-a-service)
  3. Stargate
  4. K8ssandra
  5. Kubernetes
  6. Kubernetes operator for Cassandra
  7. How to Connect Stateful Workloads Across Kubernetes Clusters
  8. Reclaiming Persistent Volumes in Kubernetes
  9. Deploying to Multiple Kubernetes Clusters with the K8ssandra Operator
  10. We Pushed Helm to the Limit, then Built a Kubernetes Operator
  11. Why We Decided to Build a K8ssandra Operator (Part 1)
  12. Why We Decided to Build a K8ssandra Operator (Part 2)
  13. Why We Decided to Build a K8ssandra Operator (Part 3)
  14. Why We Decided to Build a K8ssandra Operator (Part 4)

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.