Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.
Key -> metadata of the value such as length and subkey format
Key-l1 -> value of item 1 in list
Key-l2 -> value of item 2 in list
Key-ln -> the lastest value in the list
I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.
Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata. To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.
This design seems to be ok?
Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?
Thanks