4

I'm new to Cassandra. CQL appears to ignore the column order in a CREATE TABLE statement, and order columns by primary key first and then by remaining columns in lexicographic order. I understand that's how they're stored internally, but coming from a traditional database perspective, disregarding column order and leaking that implementation detail to the user is very surprising. Is this documented anywhere?

[cqlsh 4.1.1 | Cassandra 2.1.8 | CQL spec 3.1.1 | Thrift protocol 19.39.0]

cqlsh:test> create table test (c int primary key, b text, a int);
cqlsh:test> describe table test;

CREATE TABLE test (
  c int,
  a int,
  b text,
  PRIMARY KEY (c)
)

That makes it difficult to import a CSV file with columns in the order you thought you were using.

cqlsh:test> copy test from stdin;
[Use \. on a line by itself to end input]
[copy] 1,abc,2
Bad Request: line 1:44 no viable alternative at input ',' (... c, b) VALUES ([abc],...)
Aborting import at record #0 (line 1). Previously-inserted values still present.

0 rows imported in 7.982 seconds.

cqlsh:test> copy test from stdin;
[Use \. on a line by itself to end input]
[copy] 1,2,abc
[copy] \.

1 rows imported in 14.911 seconds.

The solution appears to be to specify the columns in the COPY statement (or reorder you CSV data).

copy test (c, b, a) from stdin;
[Use \. on a line by itself to end input]
[copy] 1,abc,2
[copy] \.

1 rows imported in 5.727 seconds.
Doctor J
  • 5,974
  • 5
  • 44
  • 40

2 Answers2

0

You should specify the columns you wish to transact with. Never assume column order with Cassandra, even if you alter your csv file to match the order it's safer just to specify the exact columns even on tables with many columns.

Cassandra uses column order and specific storage locations to make accessing data faster.

Vendrad
  • 33
  • 6
0

Cassandra orders its columns as following:

  1. Partition Key
  2. Clustering Key
  3. Remaining columns in alphabetical order.

For example say I create the following table:

CREATE TABLE products (
product_id text,
account_id text,
avg_rating float,
brand text,
brand_name text
PRIMARY KEY (product_id, account_id)
) WITH CLUSTERING ORDER BY (account_id ASC);

1st column = product_id (since it's the partition key)
2nd column = account_id (since it's a clustering key)
rest of the columns are in alphabetical order.

Bigby
  • 321
  • 5
  • 16