Introducing the DataStax Apache Kafka® Connector
UPDATED Dec. 18, 2019: As part of our ongoing support of the Cassandra community, DataStax has madeApache Kafka® Connector freely available for Open Source Cassandra users. Learn more here!
Built by the team that authors the DataStax Drivers for Apache Cassandra®, the DataStax Apache Kafka Connector capitalizes on the best practices of ingesting to DataStax Enterprise (DSE) while delivering enterprise-grade resiliency and security.
Modern architectures are made up of a diverse landscape of technologies, each serving its purpose within the data ecosystem. Apache Kafka fits naturally as a distributed queue for event-driven architectures, serving as a buffer layer to transport the messages to the database and surrounding technologies.
There is no better solution in the market to complement Apache Kafka than DSE. As an operational data layer and hybrid cloud database, DSE delivers a multi-model persistent data store that never goes down and scales horizontally to deliver real-time access that is needed to serve enriched, personalized applications.
Automatic Ingest from Kafka to DSE
The DataStax Apache Kafka Connector is the bridge that allows data to seamlessly move from Apache Kafka to DSE in event-driven architectures. Known in the Kafka Connect framework as a sink, the key features of this connector are its market-leading performance, flexibility, security, and visibility. All of this is offered with DataStax Enterprise and Apache Cassandra at no additional cost.
As mentioned, the DataStax Apache Kafka Connector is built by the experts that develop and maintain Apache Cassandra’s drivers. Without going into the weeds, the same techniques used in the DataStax Bulk Loader that proved to outperform all other bulk loading solutions for Cassandra are also leveraged in the connector.
Flexibility
The design of this sink considers the varying data structures that are found in Apache Kafka, and the selective mapping functionality in the connector allows the user to specify the Kafka fields that should be written to DSE columns. This allows for a single connector instance to read from multiple Apache Kafka topics and write to many DSE tables, thereby removing the burden of managing several connector instances. Whether the Apache Kafka data is in Avro, JSON, or string format, the DataStax Apache Kafka Connector extends advanced parsing to account for the wide range of data inputs.
Security
One of the core value propositions of DSE is its enterprise-grade security. With built-in SSL, LDAP/Active Directory, and Kerberos integration, DSE contains the tools needed to achieve strict compliance regulations for the connection from client to server. These security features are also included in the DataStax Apache Kafka Connector, ensuring that the connection between the connector and the data store is secure.
Visibility
In regards to visibility and error handling, we know that in complex distributed environments, things are bound to hit points of failure. The engineering team at DataStax took special care to account for these error scenarios and all of the intelligence of the DataStax Drivers is applied in the DataStax Apache Kafka Connector. Additionally, there are metrics included that give the operator visibility into the failure rate and latency indicators as the messages pass from Kafka to DSE.
Available Now
We are excited to release this connector and improve the interoperability of DSE in the data ecosystem for DSE versions 5.0 and above. Stay tuned for coming blogs that will detail advanced usage of this sink, visit our documentation and examples for more information, and download the new connector today to try out in your own environment.
Learn about the DataStax Apache Kafka Connector in this short course.
Details of Connector Functionality Below
FEATURES |
DATASTAX |
DESCRIPTION |
Fully supported by DataStax |
DataStax fully supports and provides expert services for the connector |
|
Consume Kafka Primitive data format |
Connector accepts Kafka record data that is in primitive type form |
|
Consume Kafka JSON data format |
Connector accepts Kafka record data that is valid JSON form |
|
Consume Kafka Avro data format |
Connector accepts Kafka record data that is valid Avro form |
|
Pluggable Connect converters |
Connector works with StringConverter, JsonConverter, AvroConverter, ByteArrayConverter, and Numeric Converters, as well as custom data converters Note that the producer of the data must use the same Converter as the connector |
|
Provides JMX metrics |
Connector exposes JMX metrics for record/failure count and latency recordings |
|
Runs within Connect Worker |
Connector is deployed in the Kafka Connect framework |
|
At least once guarantee |
Connector stores the offset in Kafka and will pick up where it left off if restarted This minimizes the additional work but there are situations where writes to DSE will be retried if many records are in a single failed batch The connector ensures that no records are missed |
|
Standalone mode support |
Connector is deployed in Kafka Connect framework and works in standalone mode (meant for dev/test) |
|
Distributed mode / HA support |
Connector is deployed in Kafka Connect framework and works in distributed mode (meant for production) |
|
Flexible Kafka topic => DSE table mapping |
Connector extends flexible mapping functionality to control the specific fields that are pulled from Kafka and written to DSE |
|
Single Kafka topic => multiple DSE tables |
Connector enables common denormalization patterns for DSE by allowing a single topic to be written to many DSE tables |
|
Connector throttling + parallelism |
Connector has built-in throttling to limit the max concurrent requests that can be sent by a single connector instance Parallelism is delivered through the integration with the Kafka Connect distributed framework and asynchronous connector internals |
|
Flexible date/time/timestamp formats |
Connector accounts for the case that typically separate teams write to the same Kafka deployment and may use varying formats for date/time fields |
|
Configurable consistency level |
Connector allows configuring DSE consistency level on a per topic-table basis |
FEATURES |
DATASTAX |
DESCRIPTION |
Row-level TTL |
Connector allows configuring DSE row-level TTL on a per topic-table basis |
|
Deletes |
Connector allows configuring DSE deletes on a per topic-table basis |
|
Handling of nulls |
Connector allows configuring DSE null handling on a per topic-table basis |
|
Error handling |
Connector has built-in error handling for various failure scenarios These scenarios include bad mappings and DSE write issues |
|
Offset management |
Connector leverages the Kafka Connect framework to manage offsets by storing the offset in Kafka |
|
Connector => DSE SSL |
Connector allows configuring connection to DSE with SSL |
|
Connector => |
Connector allows configuring connection to DSE with username/password |
|
Connector => |
Connector allows configuring connection to DSE with LDAP/Active Directory |
|
Connector => DSE Kerberos |
Connector allows configuring connection to DSE with Kerberos |
|
Configurable DSE write timeout |
Connector allows configuring write timeout to DSE |
|
Connector => DSE compression |
Connector allows configuring connection to DSE with compression strategies |