CompanyDecember 11, 2018

Introducing the DataStax Apache Kafka® Connector

Introducing the DataStax Apache Kafka® Connector

UPDATED Dec. 18, 2019: As part of our ongoing support of the Cassandra community, DataStax has madeApache Kafka® Connector freely available for Open Source Cassandra users. Learn more here!


Built by the team that authors the DataStax Drivers for Apache Cassandra®, the DataStax Apache Kafka Connector capitalizes on the best practices of ingesting to DataStax Enterprise (DSE) while delivering enterprise-grade resiliency and security.

Modern architectures are made up of a diverse landscape of technologies, each serving its purpose within the data ecosystem. Apache Kafka fits naturally as a distributed queue for event-driven architectures, serving as a buffer layer to transport the messages to the database and surrounding technologies.

There is no better solution in the market to complement Apache Kafka than DSE. As an operational data layer and hybrid cloud database, DSE delivers a multi-model persistent data store that never goes down and scales horizontally to deliver real-time access that is needed to serve enriched, personalized applications.

Automatic Ingest from Kafka to DSE

The DataStax Apache Kafka Connector is the bridge that allows data to seamlessly move from Apache Kafka to DSE in event-driven architectures. Known in the Kafka Connect framework as a sink, the key features of this connector are its market-leading performance, flexibility, security, and visibility. All of this is offered with DataStax Enterprise and Apache Cassandra at no additional cost.

As mentioned, the DataStax Apache Kafka Connector is built by the experts that develop and maintain Apache Cassandra’s drivers. Without going into the weeds, the same techniques used in the DataStax Bulk Loader that proved to outperform all other bulk loading solutions for Cassandra are also leveraged in the connector.

Flexibility

The design of this sink considers the varying data structures that are found in Apache Kafka, and the selective mapping functionality in the connector allows the user to specify the Kafka fields that should be written to DSE columns. This allows for a single connector instance to read from multiple Apache Kafka topics and write to many DSE tables, thereby removing the burden of managing several connector instances. Whether the Apache Kafka data is in Avro, JSON, or string format, the DataStax Apache Kafka Connector extends advanced parsing to account for the wide range of data inputs.

Security

One of the core value propositions of DSE is its enterprise-grade security. With built-in SSL, LDAP/Active Directory, and Kerberos integration, DSE contains the tools needed to achieve strict compliance regulations for the connection from client to server. These security features are also included in the DataStax Apache Kafka Connector, ensuring that the connection between the connector and the data store is secure.

Visibility

In regards to visibility and error handling, we know that in complex distributed environments, things are bound to hit points of failure. The engineering team at DataStax took special care to account for these error scenarios and all of the intelligence of the DataStax Drivers is applied in the DataStax Apache Kafka Connector. Additionally, there are metrics included that give the operator visibility into the failure rate and latency indicators as the messages pass from Kafka to DSE.

Available Now

We are excited to release this connector and improve the interoperability of DSE in the data ecosystem for DSE versions 5.0 and above. Stay tuned for coming blogs that will detail advanced usage of this sink, visit our documentation and examples for more information, and download the new connector today to try out in your own environment.

Learn about the DataStax Apache Kafka Connector in this short course.

Details of Connector Functionality Below

FEATURES

DATASTAX

DESCRIPTION

Fully supported by DataStax

check-arrow

DataStax fully supports and provides expert services for the connector

Consume Kafka Primitive data format

check-arrow

Connector accepts Kafka record data that is in primitive type form

Consume Kafka JSON data format

check-arrow

Connector accepts Kafka record data that is valid JSON form

Consume Kafka Avro data format

check-arrow

Connector accepts Kafka record data that is valid Avro form

Pluggable Connect converters

check-arrow

Connector works with StringConverter, JsonConverter, AvroConverter, ByteArrayConverter, and Numeric Converters, as well as custom data converters Note that the producer of the data must use the same Converter as the connector

Provides JMX metrics

check-arrow

Connector exposes JMX metrics for record/failure count and latency recordings

Runs within Connect Worker

check-arrow

Connector is deployed in the Kafka Connect framework

At least once guarantee

check-arrow

Connector stores the offset in Kafka and will pick up where it left off if restarted This minimizes the additional work but there are situations where writes to DSE will be retried if many records are in a single failed batch The connector ensures that no records are missed

Standalone mode support

check-arrow

Connector is deployed in Kafka Connect framework and works in standalone mode (meant for dev/test)

Distributed mode / HA support

check-arrow

Connector is deployed in Kafka Connect framework and works in distributed mode (meant for production)

Flexible Kafka topic => DSE table mapping

check-arrow

Connector extends flexible mapping functionality to control the specific fields that are pulled from Kafka and written to DSE

Single Kafka topic => multiple DSE tables

check-arrow

Connector enables common denormalization patterns for DSE by allowing a single topic to be written to many DSE tables

Connector throttling + parallelism

check-arrow

Connector has built-in throttling to limit the max concurrent requests that can be sent by a single connector instance Parallelism is delivered through the integration with the Kafka Connect distributed framework and asynchronous connector internals

Flexible date/time/timestamp formats

check-arrow

Connector accounts for the case that typically separate teams write to the same Kafka deployment and may use varying formats for date/time fields

Configurable consistency level

check-arrow

Connector allows configuring DSE consistency level on a per topic-table basis

 

FEATURES

DATASTAX

DESCRIPTION

Row-level TTL

check-arrow

Connector allows configuring DSE row-level TTL on a per topic-table basis

Deletes

check-arrow

Connector allows configuring DSE deletes on a per topic-table basis

Handling of nulls

check-arrow

Connector allows configuring DSE null handling on a per topic-table basis

Error handling

check-arrow

Connector has built-in error handling for various failure scenarios These scenarios include bad mappings and DSE write issues

Offset management

check-arrow

Connector leverages the Kafka Connect framework to manage offsets by storing the offset in Kafka

Connector => DSE SSL

check-arrow

Connector allows configuring connection to DSE with SSL

Connector =>
DSE username/password

check-arrow

Connector allows configuring connection to DSE with username/password

Connector =>
DSE LDAP/Active Directory

check-arrow

Connector allows configuring connection to DSE with LDAP/Active Directory

Connector => DSE Kerberos

check-arrow

Connector allows configuring connection to DSE with Kerberos

Configurable DSE write timeout

check-arrow

Connector allows configuring write timeout to DSE

Connector => DSE compression

check-arrow

Connector allows configuring connection to DSE with compression strategies

 

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.