I have a bitset which i am using to track whether an item is present or not example
b = 01100110000
it represents that 2nd and 3rd items are present and 1st and 4th item are not present.
While searching for library which can optimise this bitset array. I came across Roaring bitmaps which sounded very exciting.
I did a quick test with it,
public static void main(String[] args) throws IOException {
RoaringBitmap roaringBitMap = new RoaringBitmap();
BitSet bitSet = new BitSet(5000);
double prob = 0.001;
Random random = new Random();
for (int i = 0; i < 5000; i++) {
if (random.nextDouble() < prob) {
bitSet.set(i);
roaringBitMap.add(i);
}
}
System.out.println(bitSet.cardinality());
System.out.println("bitset bytes: "+ bitSet.size());
System.out.println("RoaringBitmap bytes: " + roaringBitMap.getSizeInBytes() * 8);
}
Basically we are setting some values and check overall size of data structure.
when we run this with multiple prob values. I got
prob byte | bitset bytes | RoaringBitmap bytes |
---|---|---|
0.001 | 5056 | 288 |
0.01 | 5056 | 944 |
0.1 | 5056 | 7872 |
0.999 | 5056 | 65616 |
If you see as we insert more and more numbers, the memory footprint of RoaringBitmap increases.
- Is this expected?
- In the worst case should it not just fall back to bitset based implementaiton?
- can't 0.999 be treated as inverse of 0.001 and we would be able to store it in 288 bytes?
- What is the most optimal way to represent these bitset as String when we are making inter service calls and using jackson library (but not byte based serialisation libraries)