CompanyOctober 26, 2022

Going Driverless with Stargate v2 and the Cloud-Native Apache Cassandra Database

Going Driverless with Stargate v2 and the Cloud-Native Apache Cassandra Database

Stargate v2 puts Apache Cassandra at the forefront of the driverless (r)evolution in cloud-native and real-time databases, alongside etcd and ClickHouse, PrestoDB, Apache Pinot, Apache Flink, and others. Real-time applications on the cloud won’t be depending on native drivers to make app-to-data connections. They can’t.

What is Stargate and why is it important?

App development and infrastructure in modern computing have become progressively more invisible, obscured by the cloud, SaaS, and web APIs. Code to cloud is fairly straightforward, app dev frameworks are designed with distributed computing, microservices, and service-oriented architecture (SOA) in mind.  Some frameworks are even advancing reactive development alongside imperative code. Testing, release, and deployment automation are better than ever. API-first or API-only apps will be the default by 2024, rendering traditional SaaS/custom apps “legacy.”However, data is arguably the reason that all the apps and infrastructure even exist. Yet even today, data remains quite difficult to work with, much less to derive insight from. 

The challenge is particularly acute when it comes to real-time data. While real-time data is expected to grow by 10X over the next three years, outpacing the growth of the overall data market, most organizations still struggle to develop applications that take advantage of real-time data. A critical gap remains between expectation and reality. The prediction that more than 30% of data will be real-time stands in contrast to the reality that most organizations still struggle with the cost, complexity, and scarcity of skills needed to truly embrace real-time. That means the benefits of real-time data are typically only available to a select few high-performing organizations that operate on a global scale.   

In our most recent State of the Data Race research, exploring the strategies and results of more than 500 organizations, DataStax discovered that 71% of organizations can directly link revenue growth to the use of real-time data. More importantly, those organizations making real-time data a strategic focus are 2.3X more likely to experience a “transformative impact” on revenue. However, real-time infrastructure—including databases—must adapt for the web, cloud and SaaS. 

Drivers have been the workhorse of connectivity between apps and databases for decades. Yet cloud is changing all that, putting native drivers out to pasture. The cloud has chosen the web (HTTP) as its networking foundation. In cloud-native environments, where HTTP-based Web APIs are more common, many of the operational tasks are abstracted away and handled automatically on behalf of the application. For example, load balancing, health checks, and TLS termination are intrinsic to most cloud environments; even retries can be configured within the environment.  

Traditional database drivers offer real-time performance levels, but while they can be made cloud-aware, they aren’t cloud native. Native drivers encompass tasks like connection pooling, TLS, authentication, load balancing, retry policies, write coalescing, compression, health checks, and more.  Building native drivers into an otherwise cloud native, real-time application can have real and negative consequences. Also, when developers must use proprietary driver APIs, they expend precious skill-building energy on network management configuration, distracted from business logic. While data gateways have emerged to modernize databases for web APIs, independently of application development frameworks, adoption has been slow for high-performance systems due to the gap in wire performance.  

REST is by far the most popular Web API style, but it’s inherently slow. gRPC has emerged as the cloud native, Web API performance leader. gRPC outperforms REST over HTTP 1.1 by 7X-10X, and outperforms REST over HTTP/2 by 50% - 70%Any potential successor to high-performance native drivers, however, must first offer equivalent wire speed to compete in real-time data. gRPC has this potential and offers other advantages over native drivers: 

  • Much of the networking configuration is handled automatically
  • Less potential for downtime to mirror network configuration changes
  • Most importantly, broad ecosystem, and language support

This is the spark of the driverless (r)evolution. 

Introducing Stargate v2: A Cloud-Native Data API Gateway

While open source data gateways offer developer productivity for transforming databases into a variety of Web APIs, few offer gRPC. Even fewer support both native drivers and gRPC alongside slower, but easier-to-use web APIs.  Stargate v2 is a unique data API gateway that elevates gRPC to native-driver-level performance while exposing Apache Cassandra over other web APIs and a JSON Document interface. It completes the transformation of the Apache Cassandra powerhouse database to a cloud service that simplifies data access and management.

Stargate v2 is a gateway to better cloud native, real-time systems and provides developers, operators, and the Apache Cassandra ecosystem with:

  • High-performance gRPC that’s as fast as native drivers, with less network management configuration
  • New node types that bring modular, service-oriented architecture to Cassandra, improving performance and performance tuning
  • Better extensibility for new APIs and data serialization formats, mirroring Cassandra 4.1’s focus on extensibility 

Stargate’s v2 gRPC API is able to offer equivalent performance on an Astra DB free tier (5500 ops/sec for native drivers, and 5500 ops/sec for gRPC) in sustained testing.  The Stargate engineering team proves this by using an open source tool called NoSQL Bench (modified for gRPC support).


Figure 1. The baseline for the CQL API was 5,500 operations per second from a basic Astra DB cluster.


Figure 2. Stargate v2 gRPC API held a stable 5,500 operations per second from a basic Astra DB cluster.

This blistering speed is due to Stargate v2’s exploitation of every aspect of the gRPC protocol: unary, client-side streaming, server-side streaming, bidirectional calls, and of course, its superior data serialization with protobuf.

Cloud-native development with Stargate v2

Stargate v2 brings the limitless scale of Apache Cassandra to your apps, serverless functions, and services over gRPC. With wire-speed parity in place, suddenly gRPC’s other advantages have immediate and impactful relevance, sparking a change in direction for how high-performance, cloud native systems are built.

“We’re a global team, with clients and offices located across 6 continents,” said Deepak Kumar, VP of Engineering, SHIELD. “DataStax’s Astra DB provides an ideal managed Cassandra database service to underpin our fraud library of 7 billion devices and 1 billion user accounts with speed and scale. When we needed a strong Go driver to fit our development framework, we turned to the Stargate gRPC API. It’s high performing and easy to use, which empowered us to continually prove cutting edge AI and device fingerprinting technologies to stop fraud.”

gRPC was designed for cloud and HTTP/2 from the ground up. Unlike native drivers, Stargate’s gRPC API handles much of the network management configuration for you, so it’s easier to use (and learn). Native drivers  must explicitly account for tasks like connection pooling, TLS, authentication, load balancing, retry policies, write coalescing, compression, health checks, etc.—distracting developers from business logic. Stargate’s v2 gRPC interface is similar to the existing native Cassandra drivers in one respect though: it’s a transport for CQL queries. Transporting CQL over gRPC preserves Cassandra developers’ existing CQL skills.

Let’s see this in action in some Java code samples:

  • Astra101 = default grpc client connected to Astra
  • AstraSDK = Astra SDK connected to Astra
  • Stargate101 = default grpc client connected to local stargate in Docker
  • StargateSDK = Stargate SDK connected to to local stargate in Docker

Now developers have the best of both worlds: an easier-to-learn approach than native drivers, and vastly superior performance than any web API. Given their tenure, ignoring apps built on native drivers isn’t very pragmatic, so data gateways must also support and add value to them, alongside web APIs.

Stargate v2 is now available on the Postman API Network, the most popular way to create, share, test and document Web APIs. Postman users can quickly develop, test, and learn Cassandra via Stargate APIs like gRPC, GraphQL, REST,  or a schemaless JSON Document API with Stargate.  Learn more about it on the DataStax Medium Blog.

Cloud-native operations with Stargate v2

The new Stargate v2 architecture modernizes Cassandra by moving from monolithic to modular, improving performance and operational control. Stargate v2 uses dedicated node types for data storage, query coordination, and API services. This means that storage, native driver/gRPC query coordination, and Web API service nodes are independently deployable and scalable by operations. Ops can locate workloads on most optimal hardware, and/or scale just what’s bottlenecked—not the whole cluster—to improve performance.

Stargate v2 makes each of these node types deployable as its own Kubernetes pod. This improves Kubernetes and K8ssandra compatibility for cloud native operations teams. Stargate itself deploys anywhere, including bare metal, VMs, Docker, Docker Compose, Kubernetes, K8ssandra, DataStax Enterprise, and DataStax Astra DB.  

Also, when using gRPC for app-to-database connections, application resilience is improved over drivers. There’s much less need to update driver configuration to mirror network configuration changes, triggering application downtime.  

Lastly, Stargate v2 will be available soon in the Google Cloud Marketplace and AWS Marketplace for GKE/EKS. A guided marketplace experience will help you enable Stargate for your existing Cassandra cluster(s) under your existing cloud provider account.

Extensibility with Stargate v2

Stargate APIs aren’t just about developer productivity—they can also be ways to provide secondary database models, API compatibility with other databases, adopt new data serialization formats, and integrate apps and services. That means new consolidation, integration and migration opportunities for Cassandra-based systems.

It’s an exciting time for the Cassandra ecosystem: Cassandra 4.1 has new extension points for Storage, Network Encryption, Authentication, Schema Storage, and Guardrails. Stargate mirrors this extensibility focus so you can extend Stargate for APIs and data serialization formats your systems use. For example, Stargate v1 introduced a JSON Document API for Cassandra, and CMU researchers have begun experimenting with a Dynamo DB API for Stargate on the new v2 architecture. 

The original Stargate codebase was tightly coupled to the persistence engine, making it hard to extend Stargate without knowing the entire code base. Stargate v2 offers a new extensibility API that makes it easy to create additional API services using gRPC. Supplying HTTP-based APIs allows developers to extend the architecture more quickly compared to the traditional CQL binary protocol and drivers. 

Wrapping up

With Stargate v2, you can connect applications, services, and functions to Apache Cassandra over HTTP APIs and Drivers at the limitless scale of Apache Cassandra. Go driverless with Stargate v2’s new high-performance gRPC implementation! It’s just as fast as native database drivers, and with so much of the networking configuration done by gRPC automatically, it is much easier to use (and learn). Run new workloads and new application types on the same fast, reliable, and scalable Cassandra foundation in the cloud.

Come learn with us!

DataStax Developers Workshop - October 26, 2022

Join DataStax Devs to learn more about Stargate in a hands-on workshop for developers.  ​We’ll show you how to build a clone of the Netflix Homepage in React, how to interact with the database using GraphQL, and how to implement features such as infinite scroll and paging.

Register now for the Build a Netflix Clone workshop on October 26, 2022


Postman Livestream - November 3, 2022

Join the Postman YouTube livestream to learn more about the driverless (r)evolution in high-performance systems with gRPC, hosted by Postman’s Ian Douglas and DataStax’s Jeff Carpenter!

We’re here to help you build something awesome, and look forward to your feedback. 

Find out more

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.