I made my own implementation of HyperLogLog algorithm
. It works well, but sometimes I have to fetch a lot (around 10k-100k) of HLL structures and merge them.
I store each of them as a bit string so first I have to convert each bit string to buckets. Since there is a LOT of HLL's it takes more time than I would like it to.
Currently around 80% of runtime takes this line of code called once for each HLL:
my @buckets = map { oct '0b'.$_ } unpack('(a5)1024', $bitstring);
Is there any way to do it faster?
If we leave definition of HyperLogLog behind, task could be explained like this: given that $bitstring
consists from 1024 5-bit counters (so each counter could have value up to 32) we have to convert it to array of 1024 integers.