Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

Question

Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.

Key -> metadata of the value such as length and subkey format Key-l1 -> value of item 1 in list Key-l2 -> value of item 2 in list Key-ln -> the lastest value in the list

I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.

Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata. To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.

This design seems to be ok?

Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?

Thanks

`Suggested edit queue is full` – Dzmitry Lahoda Jan 17 '22 at 18:05 — Dzmitry Lahoda, Jan 17 '22 at 18:05

score 1 · Answer 1 · answered Sep 11 '20 at 18:22

You should use Merge Operator, it's designed for such value append use case. Your design is read-before-write, which has performance penalty, in general it should be avoided if possible: What's read-before-write in NoSQL?.

Options options;
options.merge_operator.reset(new StringAppendOperator(','));
DB::Open(options, kDBPath, &db)
...
db->Merge(WriteOptions(), "key", "value1");
db->Merge(WriteOptions(), "key", "value2");

db_->Get(ReadOptions(), "key", &result); // return "value1,value2"

The above example uses a predefined StringAppendOperator, which simply append new values at the end. You can defined your own MergeOperator to customize the merge operation.

In the backend, the merge operation is done on the read path (and compaction to reduce the version number), details: Merge Operator Implementation.

Jay Zhuang: What's the name of the header file (include filename) to utilize this StringAppendOperator merge operator? — humblecoder, Jan 09 '21 at 23:16
@humblecoder [header](https://github.com/facebook/rocksdb/blob/d057e8326d0aab83fab54dc89b0f3cf4de31b5a7/utilities/merge_operators/string_append/stringappend.h#L13), [impl](https://github.com/facebook/rocksdb/blob/d057e8326d0aab83fab54dc89b0f3cf4de31b5a7/utilities/merge_operators/string_append/stringappend.cc#L41). It's an internal class which is not meant to be used publicly, you can copy the code and modify based on your usecase. — Jay Zhuang, Jan 22 '22 at 00:48

Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

1 Answers1