TechnologyMay 17, 2022

Moving to Cloud-Native Applications and Data with Kubernetes and Apache Cassandra

Moving to Cloud-Native Applications and Data with Kubernetes and Apache Cassandra

Moving your applications to run in the cloud is attractive to developers. Who doesn’t like the idea of being able to easily scale out and have someone else worry about the hardware? However, making use of cloud-native methodologies to design your applications is more than just migrating them to a cloud platform, or using cloud services.

What does this mean in practice? It involves understanding the role that containers and orchestration tools play in automating your applications, how to use APIs effectively and how other elements like data are affected by dynamic changes to your application infrastructure. More specifically, it means running your application using virtually unlimited compute and storage in the cloud alongside a move to distributed data. The linearly scalable, extremely fault-tolerant, and always-available NoSQL database Apache Cassandra® was built for cloud data and is now becoming the choice of developers for cloud native applications.

How did we get here?

Over the past twenty years, there have been several big trends in distributed computing. Reliable scale networking was the big area of focus in the 2000s, which enabled the linking of multiple locations and services together so they could function at the velocity and volume the Internet demanded. This was followed in the 2010s by moving compute and storage to the cloud, which used the power of that distributed network to link application infrastructure together on-demand with elasticity. That works well for the application itself, but it doesn’t change how we’ve been managing data.

Managing a distributed database like Cassandra can be complex. To manage transactions across multiple servers, it takes some understanding of the tradeoffs presented in Brewer’s Theorem which covers Consistency, Availability and Partition Tolerance (CAP): how a database can manage data across nodes; the availability of that data; and what happens across different locations respectively. More importantly, it governs how the database reacts when non-ideal conditions are present–the inevitable failures that happen in a system with multiple parts.

Not only does your database have to manage failure cases, it also has to do this while maintaining data consistency, availability, and partition tolerance across multiple locations. This is exactly what Cassandra was built to do and has proven itself in just those tough conditions. Being rooted in a distributed foundation has given Cassandra the ability to execute hybrid cloud, multi-cloud, or geographically distributed environments from the beginning. As applications need to withstand failures and scalability problems, Cassandra has become the database of choice for developers.

What’s next?

Today, more developers are using microservices designs to decompose applications into smaller and more manageable units. Each unit fulfills a specific purpose which can scale independently using containers. To manage these container instances, the container orchestration tool Kubernetes has become the de-facto choice.

Kubernetes can create new container instances as needed which help scale the amount of compute power available for an application. Similarly, Kubernetes dynamically tracks the health of running containers – if a container goes down, Kubernetes handles restarting it and schedules its container replacement on other hardware. You can rapidly build microservice-powered applications and ensure they run as designed across any Kubernetes platform. Enabling an application to run continuously without downtimes, even while things are going wrong, is a powerful attribute of Kubernetes. 

To run Kubernetes together with Apache Cassandra, you need a Cassandra Operator within your Kubernetes cluster. This allows Cassandra nodes to run on top of your existing Kubernetes cluster as a service. Read the official documentation for Cassandra Operator here or learn how to get started on this GitHub. Operators provide an interface between Kubernetes and more complex processes like Cassandra so you can manage them together. The Kubernetes Operator handles starting a Cassandra cluster, scaling it, and dealing with failures in a way that Cassandra understands.

Since Cassandra nodes are considered stateful services, you will need to provision additional parts of your Kubernetes cluster. PersistentVolumes and StatefulSets on Kubernetes can satisfy storage requirements needed by Cassandra to guarantee that data volumes are attached to the same running nodes between any restart event. Containers for Cassandra nodes are built with the idea of external volumes and are a key element in the success of a cluster deployment. When properly configured, a single YAML file can deploy both the application and data tiers in a consistent fashion across a variety of environments.

Planning ahead

As you look at adopting microservices and using application containers, you can take advantage of fully distributed computing to help scale out. However, to really get the most out of this, you need to include distributed data in your planning. While Kubernetes makes it easier to automate and manage cloud-native applications, using Cassandra completes the picture.

In short, bringing Cassandra and Kubernetes together makes it easier to scale out applications. Planning this process involves understanding how distributed compute and distributed data can work together so you can truly take advantage of what cloud-native applications can really deliver. 

Join our Apache Cassandra Operations in Kubernetes Certification Program to level up your skills and become a certified Cassandra and Kubernetes expert. Follow us on Medium or subscribe to our YouTube channel for regular Cassandra workshops. Join us on DataStax Community, the stack overflow for Cassandra, if you have any questions. 

DataStax is proud to support DoK Day Europe 2022 and Kubecon/CloudNativeCon Europe, which is being held this week, both virtually and in person, in Valencia, Spain. We hope to see you there! Join us for a cool coding challenge and, if you’re coming in person, stop by the DataStax booth to talk about your project with our engineers.

Resources

  1. Apache Cassandra®
  2. Kubernetes
  3. What is Cass Operator?
  4. Create a Kubernetes cluster
  5. DataStax Kubernetes Operator for Apache Cassandra®
  6. GitHub: DataStax Cass Operator
  7. Apache Cassandra Operations in Kubernetes Certification Program
  8. DataStax Medium
  9. DataStax YouTube Channel
  10. DataStax Community

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.