I have huge data set(key-value) in Rocks DB and I have to search for key based on prefix of key in hand. I do not want to scan whole data set to filter out key based on key-prefix. is there any way to do that?
3 Answers
You can use something like this. Using RocksIterator there is a api exposed where you can seek to the key substring and if your key starts with the prefix then consider that key.
Please find the sample code.
List<String> result = new ArrayList<String>();
RocksIterator iterator = db.newIterator();
for (iterator.seek(prefix.getBytes()); iterator.isValid(); iterator
.next()) {
String key = new String(iterator.key());
if (!key.startsWith(prefix))
break;
result.add(String.format("%s", new String(iterator.key())));
}
Hope it will help you.

- 86
- 1
- 5
The answer of @Pramatha V works pretty well, although I made some improvements to the code. I am not deserializing the iterator key in every iteration. I am using the Bytes.increment()
from the Kafka common utils (you can extract this class and use it in your code directly). This function increments the underlying byte array by adding 1. With this approach, I can find the next bigger key than my prefix key. I am using the BYTES_LEXICO_COMPARATOR
(also from the same class) to make the comparison, but you are free to implement and use your comparator. Moreover, the function returns a map of byte arrays, which you can deserialize later in your code.
public Map<byte[], byte[]> prefixScan(final byte[] prefix) {
final Map<byte[], byte[]> result = new HashMap<>();
RocksIterator iterator = db.newIterator();
byte[] rawLastKey = increment(prefix);
for (iterator.seek(prefix); iterator.isValid(); iterator.next()) {
if (Bytes.BYTES_LEXICO_COMPARATOR.compare(iterator.key(), rawLastKey) > 0
|| Bytes.BYTES_LEXICO_COMPARATOR.compare(iterator.key(), rawLastKey) == 0) {
break;
}
result.put(iterator.key(), iterator.value());
}
iterator.close();
return result;
}

- 45
- 10
Seek is working very slow. 5.35 Seconds on SSD disk , 1 billion records.
The size of the Keys are fixed 16 bytes. Searched for 8 bytes.
2 Long bytes [xx,xx]
Searched for 1 Long as 8 bytes.
Use ColumnFamily for mapping keys.

- 93
- 1
- 9
-
Please add some explanation to your answer such that others can learn from it – ManUtopiK Jun 21 '20 at 22:32