GuideMar 02, 2022

How to run Cassandra on Azure

Get Started for Free
How to run Cassandra on Azure

How to Run Cassandra on Azure

Apache Cassandra® brings massive scalability, unshakable high availability, and seamless integration with Azure services to your cloud-native apps, keeping downtime at zero. Pair it with Azure, and you get effortless growth, nonstop uptime, and easy tie-ins to tools like Astra DB.

Powering giants like Netflix, Uber, and Pinterest, plus thousands of top engineering teams, this NoSQL database is battle-tested. Here, we’ll lay out the best managed and self-managed ways to run Cassandra on Azure.

Three ways to run Cassandra on Azure

managed services

Managed service: Using Astra DB on Azure

The fastest way to use Cassandra on Azure is with Astra DB, a database-as-a-service built on Cassandra, Kubernetes, Prometheus, Envoy, and other cutting-edge open source. Astra DB simplifies cloud-native application development and requires no operations or self-management. It reduces deployment time from weeks to minutes, delivering an unprecedented combination of serverless autoscaling, pay-as-you-go pricing, and an open source skillset you can take with you to any cloud provider.

self managed ec2

Some IT organizations require complete control over their systems, or are already set up for self-managed software. With self-managed virtual machines you have that control, but it comes with all the associated effort and expense, so it’s a tradeoff that should be considered carefully.


self managed eks

Self-managed service: K8ssandra on AKS

K8ssandra is a cloud native distribution of Apache Cassandra® that runs on Kubernetes and AKS (Azure Kubernetes Service). K8ssandra provides an ecosystem of tools to provide richer data APIs and automated operations alongside Cassandra. This includes metrics monitoring to promote observability, data anti-entropy services to support reliability, and backup/restore tools to support high availability and disaster recovery. As part of K8ssandra's installation process, all of these components are installed and wired together, freeing you from having to perform the tedious plumbing of components like:

  • Colocate data and applications anywhere in the world—without compromising performance, availability, or accessibility
  • Database can be replicated across multiple data centers, availability zones, even multi-region—no leader/follower troubleshooting headaches
  • Operate in any of Astra's globally available Azure regions and availability zones
  • Absolutely no low-level Azure infrastructure knowledge required to deploy: name your database and keyspace, then select a region and you’re done
  • Astra DB Data Explorer JetBrains IDE Plugin
  • Azure Private Link connectivity connects apps in your VPC to Astra DB
OptionManaged Service: Astra DBSelf-Managed: Cassandra on Azure VMsSelf-Managed: K8ssandra on AKS
OverviewAstra DB runs Cassandra on Azure with zero fuss—a managed DBaaS built for simplicity.You control everything with VMs, but it takes elbow grease.K8ssandra blends Cassandra with Kubernetes for cloud-native ease.
Setup TimeDeploys in 5 minutes—select an AWS region and start.Days or weeks—set up VMs, networking, and Cassandra yourself.Hours—configure AKS, Terraform, and K8ssandra tools.
ScalabilityAutoscales serverlessly to match your traffic on the fly.Scale manually by adding VMs and tweaking configs.Scales via Kubernetes—you tweak the cluster as needed.
Management EffortNone—Azure handles updates, backups, and repairs for you.High—you manage OS, patches, and database upkeep.Moderate—tools like Reaper cut some of the workload.
Cost ModelPay-as-you-go; free tier gives 80GB and 20M ops monthly.Fixed VM costs plus your team’s time for maintenance.AKS cluster fees plus effort to monitor and adjust.
Best ForTeams craving speed and no ops—like devs or startups.Orgs with VM skills needing total control.Kubernetes fans wanting cloud-native flexibility.
Key BenefitScales globally across Azure regions without a hitch.Full customization, if you’ve got the resources to run it.Rich toolkit—like Stargate—eases Kubernetes ops.

Get started with Astra DB on Azure

Astra DB kicks off your Apache Cassandra journey on Azure with zero stress. Sign up here using your GitHub, Google ID, or email to grab 80GB of free storage and up to 20 million read/write ops monthly—no credit card needed for the free tier. Here’s how to get rolling:

  1. Set Up Your Astra DB Account: Hit the sign-up link, create an account, and log into the Astra DB dashboard. Click “Create Database,” name it (e.g., “myapp_data”), set a keyspace (e.g., “prod”), choose Azure as your cloud provider, and pick a region like “East US.” Hit “Create”—in under five minutes, your database is live and ready to use.
  2. Run Queries in the CQL Console: Open the built-in CQL Console from the dashboard—no downloads or installs required. Type in Cassandra Query Language (CQL) commands like CREATE TABLE users (id uuid PRIMARY KEY, name text), then INSERT INTO users (id, name) VALUES (uuid(), 'Alex') to add data. Watch it show up instantly, right in your browser.
  3. Dive into Videos and Docs: New to Cassandra? Check out Astra DB’s videos and documentation. The playlist packs short, punchy tutorials—think setting up keyspaces or connecting apps—that get you comfy fast. Spend 15 minutes there, and you’ll feel like a pro.
  4. Build with Sample Apps: DataStax hands you a stash of sample apps like e-commerce APIs and real-time trackers to speed things up. Grab one from the dashboard, tweak the code (swap in your keyspace name), and deploy it quicker than you’d make lunch. It’s all about cutting corners without cutting quality.
  5. Connect Your Tools: Hook Astra DB into your stack with drivers for Python, Java, or Node.js, or use REST and GraphQL APIs for a fast tie-in. For DevOps folks, the Terraform Provider or Postman Collection automates setups. It’s plug-and-play, designed to fit your workflow.

Get started with Cassandra on Azure VMs

Running Cassandra on Azure VMs gives you full control—but it’s a hands-on gig. Here’s the step-by-step to set it up right:

  1. Grab a Prebuilt ARM Template (Development): For a quick dev setup, use an Azure Resource Manager (ARM) template with Cassandra baked in. Find prebuilt ones on the Azure Marketplace—like Bitnami’s—pick a VM size (e.g., D2s_v3), deploy it, and SSH in with ssh -i your-key.pem azureuser@your-ip to test it out.
  2. Build a Custom ARM Template (Production): For testing, staging, or production, start with a trusted base like Ubuntu 20.04 and craft your own ARM template. Download Cassandra from the Apache Cassandra website, skip extras for security, and tune it for performance. This keeps it lean and locked down.
  3. Plan Your Team’s Skills: Check your squad—do they know Linux, Azure networking, and Cassandra ops? Figure out if you need training or hires. Plan for ongoing work: updates, scaling, monitoring. A small team might need a month to get solid, so carve out that time.
  4. Install Cassandra: On your VM, add the Cassandra repo (echo "deb http://www.apache.org/dist/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list), install with sudo apt install cassandra, and start it (sudo systemctl start cassandra). Confirm it’s running with nodetool status.
  5. Set Up the VM: Launch a VM (e.g., 4 vCPUs, 16 GB RAM) in the Azure Portal. Install just what Cassandra needs—Java, basic tools—and ditch the rest. Test with a small table (CREATE TABLE test (id int PRIMARY KEY);) to ensure it’s humming before going big.
  6. Configure Networking: Set up a virtual network (VNet), define subnets, and open ports (9042 for CQL, 7000 for cluster talk) in your network security group. Limit access to your app’s IP range—test with nc -zv your-ip 9042. Keep it tight and secure.
  7. Allow Monitoring Traffic: Update network security groups to let monitoring tools (e.g., Prometheus) through—open port 9090 or whatever your tool needs. Verify with a quick connection test so you can track performance without gaps.
  8. Keep It Updated: Patch the OS (sudo apt update && apt upgrade) and Cassandra monthly—or sooner if security alerts drop. Restart with sudo systemctl restart cassandra after updates and check nodetool status to confirm it’s back online.
  9. Manage the Database:
    • Scale: Add VMs, update seeds in cassandra.yaml, and join them to the cluster.
    • Backup/Restore: Run nodetool snapshot and stash backups in Azure Blob Storage.
    • Disaster Recovery: Copy data across regions—test restores every quarter.
    • Capacity: Watch disk and CPU with Azure Monitor. Bump VM size if traffic spikes.
    • Repair: Schedule nodetool repair weekly to keep data consistent.
  10. Monitor Performance: Use Azure Monitor or Prometheus to spot failed ops—like timeouts or slow queries. Tweak JVM settings in jvm.options (e.g., heap size) if memory lags. Keep it smooth under pressure.
  11. Stay Current with Azure: Track Azure VM updates—like new instance types or networking perks. Swap to a beefier VM (e.g., D4s_v3) if it saves cash or boosts speed. Test changes in a staging VM first.
  12. Secure the Setup: Use Azure RBAC for access, turn on encryption in cassandra.yaml, and check logs weekly. Lock down the Azure security policy—only trusted IPs hit your cluster. Stay vigilant.

Get started with K8ssandra on AKS

K8ssandra runs Cassandra on Azure Kubernetes Service (AKS) with a cloud-native edge and built-in tools. Here’s how to fire it up:

  1. Install Terraform: Download Terraform from its site, install it (sudo unzip terraform.zip -d /usr/local/bin), and check terraform -version. This sets up your AKS cluster fast.
  2. Clone the K8ssandra Project: Run git clone https://github.com/k8ssandra/k8ssandra-terraform to grab the setup scripts. Jump into the folder (cd k8ssandra-terraform)—it’s your starting point.
  3. Set Up az CLI: Install the Azure CLI (curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash), log in with az login, and set your subscription (az account set --subscription your-sub-id). This links you to Azure.
  4. Configure Environment Variables: Export your Azure creds: export AZURE_SUBSCRIPTION_ID=your-id, export AZURE_TENANT_ID=your-tenant, export AZURE_CLIENT_ID=your-client. Add them to ~/.bashrc so they stick.
  5. Provision AKS Infrastructure: In the cloned repo, run terraform init then terraform apply. Set up a 3-node cluster (e.g., Standard_D2s_v3) in a region like “West US”—takes about 15 minutes to spin up.
  6. Grab Kubeconfig: Pull your cluster creds with az aks get-credentials --resource-group your-group --name k8ssandra-cluster. Test it with kubectl get nodes—you should see all nodes “Ready.”
  7. Install K8ssandra: Add the Helm repo (helm repo add k8ssandra https://helm.k8ssandra.io/stable), update it (helm repo update), and deploy (helm install k8ssandra k8ssandra/k8ssandra). Check pods with kubectl get pods.
  8. Deploy with Helm: Customize with a values.yaml file—set pod count or storage (e.g., cassandra.persistence.size=50Gi)—then run helm upgrade k8ssandra k8ssandra/k8ssandra -f values.yaml. Watch it roll out smoothly.
  9. Retrieve Superuser Credentials: Get creds with kubectl get secret k8ssandra-superuser -o jsonpath="{.data.username}" | base64 -d (and again for password). Log into CQLSH with these to manage your cluster.

Which one is the most efficient way of running Cassandra on AWS?

This answer depends on your requirements, your existing investments, your staff and their skills - a host of factors.

In general, we recommend Astra DB for the vast majority of Cassandra use cases. You can be ready to go in minutes, freed from operational, security and scalability concerns.

All but the most demanding, security-conscious applications will be served by environments like Astra DB that are already compliant to common security standards, saving months or even years of effort, to say nothing of expense.

Startups and enterprises alike who do not want to, or cannot, dive deep into database administration and configuration should opt for Astra DB.

Self-managing databases on Kubernetes is less efficient than DBaaS, but may be driven by preexisting organizational proficiency with Kubernetes. K8s managed services like Azure GKE and K8ssandra not only make running system-of-engagement databases on Kubernetes possible but can significantly ease the burden on SRE/Ops teams.

Self-managing IaaS is the least efficient option relative to DBaaS, but may be driven by a need to self-manage for regulatory reasons or the need to interoperate with proprietary or custom systems. Alternatively, a self-managed IaaS may involve the nature of an existing application, being migrated to the cloud. Your application may simply not require, or be ready for, a cloud-native architecture.

Why Astra DB?

Global Scale

  • Scales to petabytes without a hiccup.
  • Places data and apps anywhere worldwide—no dips in speed or uptime.
  • Replicates across Azure regions and zones, skipping leader/follower messes.
  • Splits compute and storage for cost-smart scaling—or down to zero when idle.
  • Tweaks consistency to balance availability and accuracy across Cassandra nodes.

No Operations

  • Autoscales serverlessly—no sizing headaches or manual tweaks.
  • Launches in 5 minutes: pick an Azure region, name it, and start.
  • Handles updates, backups, and repairs automatically.
  • Runs in any Azure region Astra DB supports.
  • Manages infra hiccups with Kubernetes to keep things smooth.
  • Stays up with self-healing and multi-node replication for zero data loss.
  • Delivers 99.9% uptime in one region, 99.99% across many—no SRE overtime.
  • Ties in Grafana for real-time health checks.

DBaaS as APIs

  • Skips schema prep—use it like a JSON store with the Document API.
  • Offers REST, GraphQL, and gRPC for quick integration.
  • Powers cloud-native setups with a microservices-first approach.

Developer Productivity

  • Needs no Azure expertise—name your database, pick a region, done.
  • Packs drivers for Python, Java, and more, plus a CQLSH console.
  • Works with Spring Boot, Quarkus, and Spark out of the box.
  • Adds Terraform and Postman for DevOps ease.

Enterprise Security

  • Simplifies multi-region data rules without replication hassles.
  • Locks down with SOC2, mTLS, and JWT authentication.
  • Encrypts all data, end-to-end, with Azure Private Link options.

Features of Astra DB managed Cassandra on Azure

Serverless Database Built on Apache Cassandra®

Scale database resources in and out on demand to match application requirements and traffic so that you pay only for what you use. Put the power of Cassandra in the hands of every developer without ever worrying about managing the infrastructure.

Global Scale

Data replication across multiple data centers, availability zones, and multi-region. Scale-up to petabytes of data without impacting performance. The Astra service is resilient and highly available to minimize both downtime and the need for site-reliability engineering.

Enterprise Security

All data is encrypted at rest and in motion. Sophisticated authentication and authorization with role based access. Client connections use two-way certificate validation for VPN-level security from client to database. Private connectivity options like VPC peering upon request. JSON web token(JWT) based authentication to ensure secure connectivity to your Astra DB database.

No Operations

Fully managed database and OS updates and upgrades. IaaS (Infrastructure-as-a-Service) failures handled gracefully by K8s operator to keep databases healthy. Eliminate anti-entropy repair procedures. Auto scaling eliminates manual configuration changes and guesswork on database sizing.

FAQs

1. How do I deploy Apache Cassandra on Azure?

To deploy Cassandra on Azure, you can:

  • Use Astra DB, a fully managed Cassandra service, requiring no operational overhead.
  • Set up a self-managed Cassandra cluster on Azure Virtual Machines (VMs) using an Ubuntu virtual machine or Windows-based VM.
  • Use Azure Managed Instance for Apache Cassandra, which provides a scalable and highly available solution with full control over your Cassandra nodes.

2. How do I access Cassandra on Azure?

Once your Cassandra cluster is deployed, you can access it through:

  • The Azure Portal to manage configuration, scaling, and security.
  • Cassandra Query Language (CQL) commands to interact with databases, tables, and nodes.
  • Azure Private Link to connect to Cassandra securely within the same virtual network.

3. What are the benefits of running Cassandra on Azure?

Running Apache Cassandra on Azure provides:

  • High availability and fault tolerance with multi-region replication.
  • Seamless integration with Azure services.
  • Custom scalability with adjustable throughput, storage, and compute resources.
  • Performance optimization tools, such as metrics for cache misses, disk performance, and latency tracking.

4. Is Astra DB free on Azure?

Yes, Astra DB on Azure offers a free tier with 80GB of storage and up to 20 million read/write operations per month. Astra DB is serverless, meaning you are only billed for the resources you use beyond the free allocation.

6. How do I migrate an existing Cassandra workload to Azure?

To migrate Cassandra workloads to Azure:

  • Use Azure Migrate to move on-premises Cassandra clusters to Azure Virtual Machines.
  • Export and import Cassandra databases using CQL commands and Azure Data Factory.
  • Transition to Azure Managed Instance for Apache Cassandra for easier scaling and management.

7. How do I secure a Cassandra deployment on Azure?

To ensure security and compliance, follow these best practices:

  • Use Azure Private Link to keep Cassandra traffic within a private IP range.
  • Configure firewall rules and role-based access control (RBAC) in the Azure Portal.
  • Enable encryption for data at rest and in transit to protect sensitive information.
  • Regularly monitor Cassandra logs and performance metrics using Azure Monitor.

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.