GuideMar 02, 2022

How to run Cassandra on Google Cloud

Get Started for Free
How to run Cassandra on Google Cloud

How to Run Cassandra on Google Cloud

Apache Cassandra® delivers unmatched scalability, global reach, and seamless integration with Google Cloud services, making it a top choice for building high-performance, geo-distributed applications with zero downtime. This NoSQL database powers some of the world’s most demanding workloads, trusted by industry leaders like Netflix, Uber, and Pinterest, plus thousands of elite engineering teams.

In this guide, we’ll break down the best managed and self-managed ways to run Cassandra on Google Cloud, so you can harness its power with ease.

Three ways to run Cassandra on AWS

managed services

Managed service: Using Astra DB on Google Cloud

The fastest way to use Cassandra on Google Cloud is with Astra DB, a database-as-a-service built on Cassandra, Kubernetes, Prometheus, Envoy, and other cutting-edge open source. Astra DB simplifies cloud-native application development and requires no operations or self-management. It reduces deployment time from weeks to minutes, delivering an unprecedented combination of serverless autoscaling, pay-as-you-go pricing, and an open source skillset you can take with you to any cloud provider.

self managed ec2

Some IT organizations require complete control over their systems, or are already set up for self-managed software. With self-managed virtual machines, you have that control—but it comes with all the associated effort and expense, so it’s a tradeoff that should be considered carefully.

self managed eks

Self-managed service: K8ssandra on GKE

K8ssandra is a cloud native distribution of Apache Cassandra® that runs on Kubernetes and GKE. K8ssandra provides an ecosystem of tools to provide richer data APIs and automated operations alongside Cassandra. This includes metrics monitoring to promote observability, data anti-entropy services to support reliability, and backup / restore tools to support high availability and disaster recovery.

As part of K8ssandra's installation process, all of these components are installed and wired together, freeing you from having to perform the tedious plumbing of components like:

  • Apache Cassandra
  • Stargate, the open-source data gateway
  • Cass-operator, the Kubernetes Operator for Apache Cassandra
  • Reaper for Apache Cassandra, an anti-entropy repair feature (plus reaper-operator)
  • Medusa for Apache Cassandra for backup and restore (plus medusa-operator)
  • Metrics Collector for Apache Cassandra, with Prometheus integration, and visualization via pre-configured Grafana dashboards
OptionManaged Service: Astra DBSelf-Managed: Cassandra on Google Compute EngineSelf-Managed: K8ssandra on GKE
OverviewAstra DB, a DBaaS built on Cassandra, simplifies deployment with no operational overheadFull control via self-managed VMs, but requires significant setup and upkeepCloud-native Cassandra on Kubernetes, with automated tools for ops
Setup TimeDeploys in 5 minutes—select a region and goTakes days to weeks—configure VMs, networking, and Cassandra manuallyHours to a day—install Terraform, Helm, and K8ssandra components
ScalabilityServerless autoscaling adjusts to traffic instantlyManual scaling requires adding VMs and reconfiguringAutoscaling via Kubernetes, but you manage cluster sizing
Management EffortNone—updates, backups, and repairs are automatedHigh—handle OS updates, security patches, and database tuning yourselfModerate—tools like Reaper and Medusa automate some tasks
Cost ModelPay-as-you-go; free tier offers 80GB and 20M ops monthlyFixed VM costs plus staff time for maintenanceGKE cluster costs plus effort to configure and monitor
Best ForTeams wanting speed and zero ops (e.g., developers, startups)Orgs needing total control and with existing VM expertiseKubernetes-savvy teams seeking cloud-native flexibility
Key BenefitGlobal scale with no hassle—replicates across regions effortlesslyComplete customization—if you’ve got the resources to manage itRich ecosystem (Stargate, Prometheus) eases ops on Kubernetes

Get started with Astra DB on Google Cloud

Astra DB makes running Apache Cassandra on Google Cloud fast and painless. Sign up here using your GitHub, Google ID, or email to unlock 80 GB of free storage and up to 20 million read/write operations monthly—no credit card needed for the free tier. Here’s how to dive in:

  1. Set Up Your Account: Create an account and log in to the Astra DB dashboard. Click “Create Database,” name your database and keyspace, select Google Cloud as your provider, and pick a region (e.g., us-central1). In under 5 minutes, your Cassandra instance is live and ready for action.
  2. Load Data with the CQL Console: Use the built-in CQL Console to run Cassandra Query Language (CQL) commands right from your browser. No extra software downloads or installs required. Try creating a table—say, CREATE TABLE users (id uuid PRIMARY KEY, name text)—and insert some data to see it in real time.
  3. Explore Tutorials and Videos: New to Cassandra? Check out Astra DB’s videos and documentation. The playlist offers bite-sized tutorials, like setting up a keyspace or connecting an app, that get you up to speed fast.
  4. Build with Sample Apps: DataStax provides a library of sample app examples (think e-commerce backends or real-time analytics) to jumpstart your project. Pick one, tweak it to fit your needs, and deploy it quicker than you’d brew a pot of coffee.
  5. Connect Your Tools: Integrate Astra DB with your stack using drivers for Python, Java, or Node.js—or tap into REST and GraphQL APIs for flexibility. For DevOps, automate deployments with the Terraform Provider or Postman Collection. It’s all plug-and-play.

Get started with Cassandra on Google Compute Engine

Running Cassandra on Google Compute Engine (GCE) gives you total control—but it’s a hands-on process. Here’s how to set it up step-by-step:

  1. Pick a Prebuilt Image (Development): For quick dev setups, grab a prebuilt virtual machine image with Cassandra installed from providers like Bitnami. Launch it via the Google Cloud Marketplace, pick a VM size (e.g., n2-standard-2), and SSH in to test.
  2. Build Your Own VM (Production): For test, staging, or production, start with a trusted base image (e.g., Ubuntu 20.04). Install Cassandra manually for security and performance: download it from the Apache Cassandra website, tweak cassandra.yaml, and strip out unnecessary dependencies.
  3. Plan Your Team’s Effort: Assess your staff’s skills. You’ll need expertise in Linux, networking, and Cassandra ops. Budget time for ongoing tasks—patching, tuning, and scaling—not just the initial setup.
  4. Set Up the VM: Launch a GCE instance (e.g., 4 vCPUs, 16 GB RAM) via the Google Cloud Console. Install Cassandra with sudo apt install cassandra after adding the repo, then start it with systemctl start cassandra. Verify it’s running with nodetool status.
  5. Configure Networking: Set up a VPC, open ports (e.g., 9042 for CQL, 7000 for internode), and lock down firewall rules to your app’s IP range. Test connectivity—your app and cluster need to talk securely.
  6. Secure and Monitor: Configure security groups to allow monitoring tools like Prometheus. Enable encryption in cassandra.yaml for data in transit. Watch for failed ops with nodetool and tweak heap size if memory spikes.
  7. Maintain the Cluster: Patch the OS and Cassandra regularly (e.g., apt update && apt upgrade). Scale by adding VMs and joining them to the cluster, then update seeds in the config. Plan backups with tools like Medusa and test restores quarterly.

Get started with K8ssandra on GKE

K8ssandra brings a cloud-native Cassandra experience to Google Kubernetes Engine (GKE) with built-in ops tools. Here’s how to get rolling:

  1. Install Core Tools: Download and install Terraform Binary, Google Cloud SDK, kubectl, and Helm v3 on your machine to power your GKE setup. Keep versions current for compatibility.
  2. Set Up Gcloud CLI: Install the gcloud CLI and authenticate with gcloud auth login. Initialize it with gcloud init, then set your project ID and region (e.g., gcloud config set project my-project).
  3. Clone the Project: Grab the k8ssandra-terraform repo with git clone https://github.com/k8ssandra/k8ssandra-terraform. Navigate into the directory—this is your launchpad.
  4. Provision GKE Infrastructure: Run terraform init and terraform apply to spin up a GKE cluster. Specify variables like cluster size (e.g., 3 nodes, n1-standard-4) in a terraform.tfvars file.
  5. Configure Kubectl: Fetch your cluster credentials with gcloud container clusters get-credentials my-cluster --region us-central1. Verify access with kubectl get nodes.
  6. Deploy K8ssandra: Add the K8ssandra Helm repo (helm repo add k8ssandra https://helm.k8ssandra.io/stable), then install it with helm install k8ssandra k8ssandra/k8ssandra. Watch it deploy Cassandra, Stargate, and Reaper.
  7. Access Credentials: Retrieve superuser credentials via kubectl get secret k8ssandra-superuser -o jsonpath="{.data.username}" | base64 -d (and repeat for password). Use these to log in and manage your cluster.
  8. Tune and Monitor: Check Grafana dashboards (preconfigured via Metrics Collector) for performance. Adjust resources in Helm values (e.g., cassandra.resources.requests.cpu) if pods struggle.

Which one is the most efficient way of running Cassandra on Google Cloud?

This answer depends on your requirements, your existing investments, your staff and their skills - a host of factors.

In general, we recommend Astra DB for the vast majority of Cassandra use cases. You can be ready to go in minutes, freed from operational, security and scalability concerns.

All but the most demanding, security-conscious applications will be served by environments like Astra DB that are already compliant to common security standards, saving months or even years of effort, to say nothing of expense.

Startups and enterprises alike who do not want to, or cannot, dive deep into database administration and configuration should opt for Astra DB.

Self-managing databases on Kubernetes is less efficient than DBaaS, but may be driven by preexisting organizational proficiency with Kubernetes. K8s managed services like Google Cloud GKE and K8ssandra not only make running system-of-engagement databases on Kubernetes possible but can significantly ease the burden on SRE/Ops teams.

Self-managing IaaS is the least efficient option relative to DBaaS, but may be driven by a need to self-manage for regulatory reasons or the need to interoperate with proprietary or custom systems. Alternatively, a self-managed IaaS may involve the nature of an existing application, being migrated to the cloud. Your application may simply not require, or be ready for, a cloud-native architecture.

Why use Astra DB to run Cassandra on Google Cloud?

Global scale

  • Scales to petabytes of data without slowing down
  • Colocates your data and apps worldwide—no trade-offs on performance or uptime
  • Replicates across GCP data centers, availability zones, or multiple regions, skipping leader/follower headaches
  • Separates compute and storage for cost-effective scaling—or scales down to zero when idle
  • Offers tunable consistency to balance availability and data accuracy across Cassandra nodes

No operations

  • Autoscales serverlessly—no more guessing database size or tweaking configs manually
  • Deploys in 5 minutes flat: pick a GCP region, name your database, and start
  • Handles OS and database updates automatically
  • Runs in any GCP region or availability zone Astra DB supports
  • Recovers from infrastructure hiccups via Kubernetes operators, keeping your database healthy
  • Ensures high availability with self-healing at the database level
  • Replicates data across nodes and GCP data centers for fault tolerance and zero data loss
  • Guarantees 99.9% uptime in a single region, 99.99% across multiple regions—less need for SRE heroics
  • Automates anti-entropy repairs and hourly backups (stored as snapshots for 20 days)
  • Integrates Grafana for real-time health and performance monitoring

DBaaS as APIs

  • Skips upfront schema design—use Astra DB like a JSON document store with the Document API
  • Supports REST, GraphQL, and gRPC APIs for quick integration
  • Boosts cloud-native architectures with a microservices-first, API-driven approach

Developer productivity

  • Requires zero GCP infrastructure expertise—just name your database and keyspace, pick a region, and go
  • Provides drivers for Python, Java, Node.js, and more, all cloud-ready
  • Includes JDBC/ODBC drivers for BI tool hookups
  • Integrates with frameworks like Spring Boot, Spring Data, and Quarkus
  • Offers the Spark Cassandra Connector for big data workflows
  • Features a built-in CQLSH console for instant queries
  • Supplies a Postman Collection, Terraform Provider, and Ansible Playbook for CI/CD automation
  • Adds a JetBrains IDE plugin (Astra DB Data Explorer) for seamless dev workflows

Enterprise security

  • Simplifies data sovereignty with multi-region GCP setups—no replication hassles
  • Meets SOC2 compliance standards
  • Uses role-based access for tight authentication and authorization
  • Secures client connections with two-way certificate validation (mTLS) for VPN-level protection
  • Encrypts all data at rest and in transit
  • Connects your GCP VPC to Astra DB via private networking options
  • Authenticates securely with JSON web tokens (JWT)

Features of Astra DB managed Cassandra on GCP

Serverless Database Built on Apache Cassandra®

Scale database resources in and out on demand to match application requirements and traffic so that you pay only for what you use. Put the power of Cassandra in the hands of every developer without ever worrying about managing the infrastructure.

Global Scale

Data replication across multiple data centers, availability zones, and multi-region. Scale-up to petabytes of data without impacting performance. The Astra service is resilient and highly available to minimize both downtime and the need for site-reliability engineering.

Enterprise Security

All data is encrypted at rest and in motion. Sophisticated authentication and authorization with role based access. Client connections use two-way certificate validation for VPN-level security from client to database. Private connectivity options like VPC peering upon request. JSON web token(JWT) based authentication to ensure secure connectivity to your Astra DB database.

No Operations

Fully managed database and OS updates and upgrades. IaaS (Infrastructure-as-a-Service) failures handled gracefully by K8s operator to keep databases healthy. Eliminate anti-entropy repair procedures. Auto scaling eliminates manual configuration changes and guesswork on database sizing.

FAQs

How do I deploy Apache Cassandra on Google Cloud?

  • Use Astra DB for a fully managed, no-fuss setup.
  • Spin up a self-managed cluster on Google Compute Engine (GCE) with VMs.

What perks come with running Cassandra on GCP?

  • Scales big and stays up with multi-region support.
  • Keeps running with zero downtime and auto-replication.
  • Ties into GCP tools like Dataflow for real-time processing.
  • Offers managed or self-hosted flexibility.

How do I connect to a Cassandra cluster on GCP?

  • Manage it via the Google Cloud Console.
  • Run CQL commands for tables and queries.
  • SSH into GCE instances directly.

Is there a managed Cassandra option on GCP?

Yes—Astra DB is a fully managed DBaaS for seamless Cassandra workloads.

Can I run Cassandra on GCP for free?

Astra DB’s free tier gives you 80GB storage and 20 million ops monthly. New GCP users might snag extra credits too.

Is Astra DB on Google Cloud Marketplace?

Yep—buy it there, integrate it easily, and bill it through your GCP account.

How do I lock down Cassandra on GCP?

  • Set firewall rules to block outsiders.
  • Use GCP IAM for access control.
  • Encrypt data at rest and in motion.
  • Keep Cassandra updated with security patches.

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.