2

I made an observation while patching a LIST in C* and hope somebody could give me a hint if there is a rational explanation for this (ignoring if the use case is actually valid).

Assume a simple table with only one primary key and one column of type list.

CREATE TABLE ks.tbl(col_primary varchar,varchar_list list<varchar>,PRIMARY KEY(col_primary)) ;

Adding to that table one row with a list of some entries in the table.

INSERT INTO ks.tbl (col_primary,varchar_list) VALUES ('0815',['OlencUIkqqlVOFPiwsoEJM','JamilUOHIOXTWuGp','AbdulvZaeQDJOdu','GoldaGugnVNnbdSBpRpd','BrennaVvYuDyERsKvVW','FletcherpkkCYpEBket','DaytonglCSvswZQTEj','EdTUkTShUerYcfiSvCIH','LandenLTThnmlAAULJwdNwAma','IsabellelrDcMFHsyBGT','ArielOhIcLglehg','BellrtifChchjMZ','EmelieDdlViBlHUPQbxyUC']);

And finally updating the row with the following entry.

UPDATE ks.tbl SET varchar_list[1]=null,varchar_list[0]='MEGGCJOFic',varchar_list=varchar_list+['nwbaGsGbcd'] WHERE col_primary='0815' IF EXISTS;

The expected output for the list is (and most of the time actually is)

['MEGGCJOFic', 'AbdulvZaeQDJOdu', 'GoldaGugnVNnbdSBpRpd', 'BrennaVvYuDyERsKvVW', 'FletcherpkkCYpEBket', 'DaytonglCSvswZQTEj', 'EdTUkTShUerYcfiSvCIH', 'LandenLTThnmlAAULJwdNwAma', 'IsabellelrDcMFHsyBGT', 'ArielOhIcLglehg', 'BellrtifChchjMZ', 'EmelieDdlViBlHUPQbxyUC', 'nwbaGsGbcd']

Now applying this to a setup with two datacenters (US, EU) and using a consistency level of LOCAL_ONE and using one datacenter for updating and the other for reading a surprising result being returned is:

 ['MEGGCJOFic', 'nwbaGsGbcd']

That's exactly the two elements which have changed. After some time the list resolves itself and the expected content is being returned.

But how would it be possible to get into such an intermediate state as described above? The same happens btw. if using MAPS instead of LIST. I do know how the data is physically being layed out for collections in C* but how would it be possible that one cluster only contains the updates but not the original data?

Horia
  • 2,942
  • 7
  • 14
smigfu
  • 855
  • 1
  • 6
  • 9

1 Answers1

4

It states how Cassandra (C*) actually works and its internal architecture. If you work with C* more and in the course of time you'll get to know more about it and its behavior. There are lots to explain but I'll mention some specific points and try to make it clear to you.

How C* stores data

How is data updated

The CAP Theorem

Cassandra and CAP

Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency in Cassandra. But Cassandra can be tuned with replication factor and consistency level to also meet Consistency.

C* Eventual Consistency:

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

C* Consistency Level:

RF -> How many copies of data (row) will be kept. (How many servers or nodes will keep the same row/data).

CL -> Acknowledgement of how many nodes is required to let client know/inform that write/read operation is successful. That means at least numbers of nodes mentioned as CL (Ex: If CL is 2 at least 2 nodes) have to acknowledge/ensure that they have written the data successfully or the data is read from those replicas (wait until all the required replicas return the result to the coordinator node) and merge the results (keep the latest data if different nodes have different updates of same data) and successfully return results to the client.

As your CL is LOCAL_ONE, Only one node from local DC has to be acknowledged, that replica can hold the backdated or old data but eventually it will get updated eventually. You can use LOCAL_QUORUM to get consistent data. For Collections, data storage are little bit different.

Using the Collections

These update operations are implemented internally without any read-before-write. Appending and prepending a new element to the list writes only the new element.

Summary & Possibilities

C* is a row-oriented DB. It stores multiple rows of data (columns) mapped to their corresponding key. C* does not do read before write thus makes it possible that many versions of the same row may exist in the database. While requesting a read, coordinator does the job of merging the different version of it comparing the timestamp and returning the latest one. It depends on the CL how many nodes are needed to be acknowledged and returned results to coordinator. If you use CL (Write CL + Read CL > RF), you'll get latest result.

Ex: Suppose you have a list [1,2,3]. Now want to append [4]. Your expected result is [1,2,3,4]. As you have used LOCAL_ONE and the coordinator node hits the replica which has this new updates only, thus returning this result.

If you are reading right after the write operation will also get inconsistent result. For next read, you will get the updated/latest merged row.

In Distributed System, these are the usual scenarios, which are very unusual for the RDMS architecture.

Some Links:

You can also see C* read and write path

The write path to compaction
How is data read?
4 node setup in cassandra is as same as 3 node setup
Understanding How CQL3 Maps to Cassandra's Internal Data Structure

Chaity
  • 1,348
  • 13
  • 20
  • 1
    That sounds reasonable what you are describing. I hadn't figured the possibility to hit a node which contains only the latest list modifications without the initial data. Great explanation thank you! – smigfu Mar 13 '18 at 10:01