About memtable in cassandra: hashtable or sorted array

Question

I've read some article and the original paper about cassandra. Now I'm confused about the memtable:

Some article says that the rows in memtable are ordered by row key, but some article says that it's like a hashtable, which is right?
About the partitioner: As there're basically two partitioner in cassandra: RandomPartitioner and ByteOrderedPartitioner, is the order of row in memtable related to the partitioner I choose? (e.g. If I choose RP, then rows are stored like in hashtable, and if I choose BOP, then rows are ordered by key?)
If rows are ordered by key, how does memtable handle inserts? (does insert cause moving of rows?)
Is there anything to do with the primary index (the row key index) implicitly maintained by cassandra?

score 1 · Accepted Answer · edited May 23 '17 at 12:12

In the future, please try to limit yourself to one question per question. Almost all of these could stand on their own as a single question.

Some article says that the rows in memtable are ordered by row key, but some article says that it's like a hashtable, which is right?

Cassandra stores its data in a (Strickland, 2014) "distributed hash table data structure." This allows data to be stored and distributed evenly across the cluster, yet still queried quickly. The value of the row key (aka partition key) is hashed using a process called Consitent Hashing. Data is then stored in the cluster, on whichever node is responsible for the token range that encompasses the key's hashed value. When you run a CQL query without specifying a WHERE clause, you can see that the order of the result set is influenced by the hashed row key value, by using the token() function.

> SELECT userid, token(userid), posttime FROM postsbyuser;

 userid | token(userid)        | posttime 
--------+----------------------+-------------------------- 
      1 | -4069959284402364209 | 2015-01-25 13:25:00-0600 
      1 | -4069959284402364209 | 2015-01-25 13:22:00-0600 
      0 | -3485513579396041028 | 2015-01-25 13:21:00-0600 
      2 | -3248873570005575792 | 2015-01-25 13:28:00-0600 
      2 | -3248873570005575792 | 2015-01-25 13:27:00-0600 
      2 | -3248873570005575792 | 2015-01-25 13:26:00-0600

About the partitioner: As there're basically two partitioner in cassandra: RandomPartitioner and ByteOrderedPartitioner, is the order of row in memtable related to the partitioner I choose? (e.g. If I choose RP, then rows are stored like in hashtable, and if I choose BOP, then rows are ordered by key?)

Ulimately, yes, the Random partitioner and Byte Ordered partitioner will distribute the data around the ring differently. And actually, you are missing the default partitioner, which is the Murmur3Partitioner. The Murmur3 partitioner has the same goal as the Random partitioner, which is to ensure even data distribution. In a new cluster, you should use the Murmur3 partitioner, and the difference between the two has been answered here: Which is better partioner. Random or Murmur3 in cassandra in termo of throughput and what is the diffence b/w them?

The BOP is still included for backward compatibility reasons, and really shouldn't be used anymore. The reasons you should avoid the BOP have also been discussed at length: Cassandra ByteOrderedPartitioner

If rows are ordered by key, how does memtable handle inserts? (does insert cause moving of rows?)

This excerpt from the DataStax documentation The Write Path of an Update explains this very well. And don't mind the title...inserts and updates are essentially the same to Cassandra.

The updates are streamed to disk using sequential I/O and stored in a new SSTable. During an update, Cassandra time-stamps and writes columns to disk using the write path. During the update, if multiple versions of the column exist in the memtable, Cassandra flushes only the newer version of the column to disk

This last excerpt from the documentation The Write Path to Compaction answers the last part of this question:

To flush the data, Cassandra sorts memtables by token and then writes the data to disk sequentially.

Is there anything to do with the primary index (the row key index) implicitly maintained by cassandra?

If I am understanding the question, this Cassandra 1.1 document (About Indexes in Cassandra) may be a little dated, but explains this along with a comparison of its RDBMS counterpart:

In Cassandra, the primary index for a column family is the index of its row keys. Each node maintains this index for the data it manages.

Rows are assigned to nodes by the cluster-configured partitioner and the keyspace-configured replica placement strategy. The primary index in Cassandra allows looking up of rows by their row key. Since each node knows what ranges of keys each node manages, requested rows can be efficiently located by scanning the row indexes only on the relevant replicas.

Hope this answers your questions.

References:

Strickland R. (2014). Cassandra High Availability. Packt Publishing Ltd. Birmingham, UK. (pp. 19-24).

About memtable in cassandra: hashtable or sorted array

1 Answers1