Migrating data using other methods
Apache Sqoop, covered earlier, transfers data between an RDBMS and Hadoop. DataStax Enterprise modified Sqoop so you can move data directly into Cassandra as well as transfer data from an RDBMS to the Cassandra File System (CFS). DataStax offers several solutions in addition to Sqoop for migrating from other databases:
- The COPY command, which mirrors what the PostgreSQL RDBMS uses for file/export import
- The DSE Search/Solr Data Import Handler, which is a configuration-driven method for importing data to be indexed for searching
- The Cassandra bulk loader that provides the ability to bulk load external data into a cluster
About the COPY command¶
You can use COPY in Cassandra’s CQL shell to load flat file data into Cassandra as well as write data out to OS files. Typically, an RDBMS has unload utilities for writing table data to OS files.
If you need more sophistication applied to a data movement situation than just extract-load, you can use any number of extract-transform-load (ETL) solutions that now support Cassandra. These tools provide excellent transformation routines for manipulating source data to suit your needs and then loading the data into a Cassandra target. The tools offer many other features such as visual, point-and-click interfaces, scheduling engines, and more.
Many ETL vendors who support Cassandra supply community editions of their products that are free and able to solve many different use cases. Enterprise editions are also available that have useful features for serious enterprise data users.
You can freely download and try ETL tools from Jaspersoft, Pentaho, and Talend that work with DataStax Enterprise and Cassandra.