4

I am confused by two seemingly contradictory statements about Cassandra

  1. No reads before writes (presumably this is because writes are sequential whereas reads require scanning a primary key index)
  2. INSERT and UPDATE have identical semantics (stated in an older version of the CQL manual but presumably still considered essentially true)

Suppose I've created the following simple table:

CREATE TABLE data (
  id varchar PRIMARY KEY,
  names set<text>
);

Now I insert some values:

insert into data (id, names) values ('123', {'joe', 'john'});

Now if I do an update:

update data set names = names + {'mary'} where id = '123';

The results are as expected:

 id  | names
-----+-------------------------
 123 | {'joe', 'john', 'mary'}

Isn't this a case where a read has to occur before a write? The "cost" would seem to be the the following

  1. The cost of reading the column
  2. The cost of creating a union of the two sets (negligible here but could be noticeable with larger sets)
  3. The cost of writing the data with the key and new column data

An insert would merely be the doing just the last of these.

John D.
  • 1,569
  • 2
  • 13
  • 11

1 Answers1

1

There is no need for read before writing.
Internally each collection stores data using one column per entry -- When you ask for a new entry in a collection the operation is done in the single column*: if the column already exists it will be overwritten otherwise it will be created (InsertOrUpdate). This is the reason why each entry in a collection can have custom ttl and writetime.

*while with Map and Set this is transparent there is some internal trick to allow multiple columns with same name inside a List.

Carlo Bertuccini
  • 19,615
  • 3
  • 28
  • 39
  • If I understand you correctly what you're saying is that my insert and update above are simply two inserts. The original one establishes one column with {'joe', 'john'} and the second one establishes an entirely new column with the value {'mary'}. That these are presented as a set in one column (names) is effectively a view on what is in fact multiple columns. – John D. Apr 30 '15 at 16:40
  • Partially correct. The first insert establish 2 new columns, not one. And if you perform again the update operation you will overwrite the last column (mary) with a new writetime. There is no read in any case – Carlo Bertuccini Apr 30 '15 at 17:03