I am confused by two seemingly contradictory statements about Cassandra
- No reads before writes (presumably this is because writes are sequential whereas reads require scanning a primary key index)
- INSERT and UPDATE have identical semantics (stated in an older version of the CQL manual but presumably still considered essentially true)
Suppose I've created the following simple table:
CREATE TABLE data (
id varchar PRIMARY KEY,
names set<text>
);
Now I insert some values:
insert into data (id, names) values ('123', {'joe', 'john'});
Now if I do an update:
update data set names = names + {'mary'} where id = '123';
The results are as expected:
id | names
-----+-------------------------
123 | {'joe', 'john', 'mary'}
Isn't this a case where a read has to occur before a write? The "cost" would seem to be the the following
- The cost of reading the column
- The cost of creating a union of the two sets (negligible here but could be noticeable with larger sets)
- The cost of writing the data with the key and new column data
An insert would merely be the doing just the last of these.