CQL for Cassandra 2.0

COPY

Imports and exports CSV (comma-separated values) data to and from Cassandra.

Synopsis

COPY table_name ( column, ...)
FROM ( 'file_name' | STDIN ) 
WITH option = 'value' AND ...

COPY table_name ( column , ... )
TO ( 'file_name' | STDOUT )
WITH option = 'value' AND ...

Synopsis Legend

  • Uppercase means literal
  • Lowercase means not literal
  • Italics mean optional
  • The pipe (|) symbol means OR or AND/OR
  • Ellipsis (...) means repeatable
  • Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description

Using the COPY options in a WITH clause, you can change the CSV format. This table describes these options:
COPY options
COPY Options Default Value Use To:
DELIMITER comma (,) Set the character that separates fields having newline characters in the file.
QUOTE quotation mark (") Set the character that encloses field values.
ESCAPE backslash (\) Set the character that escapes literal uses of the QUOTE character.
HEADER false Set true to indicate that first row of the file is a header.
ENCODING UTF8 Set the COPY TO command to output unicode strings.
NULL an empty string Represents the absence of a value.

The ENCODING option cannot be used in the COPY FROM command. This table shows that, by default, Cassandra expects the CSV data to consist of fields separated by commas (,), records separated by line separators (a newline, \r\n), and field values enclosed in double-quotation marks (""). Also, to avoid ambiguity, escape a literal double-quotation mark using a backslash inside a string enclosed in double-quotation marks ("\""). By default, Cassandra does not expect the CSV file to have a header record on the first line that consists of the column names. COPY TO includes the header in the output if HEADER=true. COPY FROM ignores the first line if HEADER=true.

COPY FROM a CSV file

By default, when you use the COPY FROM command, Cassandra expects every row in the CSV input to contain the same number of columns. The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata. Cassandra assigns fields in the respective order. To apply your input data to a particular set of columns, specify the column names in parentheses after the table name.

COPY FROM is intended for importing small datasets (a few million rows or less) into Cassandra. For importing larger datasets, use Cassandra bulk loader or the sstable2json/json2sstable2 utility

COPY TO a CSV file

For example, assume you have the following table in CQL:

cqlsh> SELECT * FROM test.airplanes;

 name          | mach | manufacturer | year
---------------+------+--------------+------
 P38-Lightning |  0.7 |     Lockheed | 1937
      
(1 rows)

After inserting data into the table, you can copy the data to a CSV file in another order by specifying the column names in parentheses after the table name:

COPY airplanes
(name, mach, year, manufacturer)
 TO 'temp.csv'

Specifying the source or destination files

Specify the source file of the CSV input or the destination file of the CSV output by a file path. Alternatively, you can use the STDIN or STDOUT keywords to import from standard input and export to standard output. When using stdin, signal the end of the CSV data with a backslash and period ("\.") on a separate line. If the data is being imported into a table that already contains data, COPY FROM does not truncate the table beforehand. You can copy only a partial set of columns. Specify the entire set or a subset of column names in parentheses after the table name in the order you want to import or export them. By default, when you use the COPY TO command, Cassandra copies data to the CSV file in the order defined in the Cassandra table metadata. You can also omit listing the column names when you want to import or export all the columns in the order they appear in the source table or CSV file.

Roundtrip copying of a simple table

Copy a table to a CSV file.

  1. Using CQL, create a table named airplanes and copy it to a CSV file.
    CREATE KEYSPACE test
      WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
    
    USE test;
    
    CREATE TABLE airplanes (
      name text PRIMARY KEY,
      manufacturer ascii,
      year int,
      mach float
    );
    
    INSERT INTO airplanes
      (name, manufacturer, year, mach)
      VALUES ('P38-Lightning', 'Lockheed', 1937, '.7');
    
    COPY airplanes (name, manufacturer, year, mach) TO 'temp.csv';
    1 rows exported in 0.004 seconds.
  2. Clear the data from the airplanes table and import the data from the temp.csv file.
    TRUNCATE airplanes;
    
    COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv';
    1 rows imported in 0.087 seconds.

Copy data from standard input to a table.

  1. Enter data directly during an interactive cqlsh session, using the COPY command defaults.
    COPY airplanes (name, manufacturer, year, mach) FROM STDIN;
  2. At the [copy] prompt, enter the following data:
    "F-14D Super Tomcat", Grumman,"1987", "2.34"
    "MiG-23 Flogger", Russian-made, "1964", "2.35"
    "Su-27 Flanker", U.S.S.R.,"1981", "2.35"
    \.
  3. Query the airplanes table to see data imported from STDIN:
    SELECT * FROM airplanes;

Output is:

 name               | manufacturer | year | mach
--------------------+------+--------------+-------------
 F-14D Super Tomcat |      Grumman | 1987 | 2.35
      P38-Lightning |     Lockheed | 1937 | 0.7
      Su-27 Flanker |     U.S.S.R. | 1981 | 2.35
     MiG-23 Flogger | Russian-made | 1967 | 2.35
      
(4 rows)

Copying collections

Cassandra 2.0 and later supports round-trip copying of collections to and from CSV files. To perform this example, download the sample code now.

  1. Unzip the downloaded file named cql_collections.zip.
  2. Copy/paste all the CQL commands from the cql_collections.txt file to the cqlsh command line.
  3. Take a look at the contents of the songs table. The table contains a map of venues, a list of reviews, and a set of tags.
    cqlsh> SELECT * FROM music.songs;
    
     id         |album|artist|data|reviews            |tags             |title|venue
    ------------+-----+------+----+-------------------+-----------------+-----+----------------------------------------------------------------------
     7db1a490...| null| null |null|['hot dance music']|         {'rock'}| null|{'2013-09-22...': 'The Fillmore', '2013-10-01...': 'The Apple Barrel'}
     a3e64f8f...| null| null |null|               null|{'1973', 'blues'}| null|null
     8a172618...| null| null |null|               null|'2007', 'covers'}| null|null
    
    (3 rows)
  4. Copy the music.songs table to a CSV file named songs-20140603.csv.
    cqlsh> COPY music.songs to 'songs-20140603.csv';
    3 rows exported in 0.020 seconds.
  5. Check that the copy operation worked.
    cqlsh> exit;
    
    $ cat songs-20140603.csv
    7db1a490...,,,,['hot dance music'],{'rock'},,"{'2013-09-22...': 'The Fillmore', '2013-10-01....': 'The Apple Barrel'}"
    a3e64f8f...,,,,,"{'1973', 'blues'}",,
    8a172618...,,,,,"{'2007', 'covers'}",,
  6. Start cqlsh again, and create a table definition that matches the data in the songs-204140603 file.
    cqlsh> CREATE TABLE music.imported_songs (
            id uuid PRIMARY KEY,
            album text,
            artist text,
            data blob,
            reviews list<text<,
            tags set<text<,
            title text,
            venue map<timestamp, text<
           );
  7. Copy the data from the CSV file into the imported_songs table.
    cqlsh> COPY music.imported_songs from 'songs-20140603.csv';
    3 rows imported in 0.074 seconds.
Show/hide