1

I have a list of 50,000,000+ 512-bit values.

I have a stream of data coming in at 1,750,000 values per second, I need to check if each of those values is in the aforementioned list or not.

Currently I've opted for redis via hiredis in c using the EXISTS command on keys. It's quite fast and I'm managing to check ~160,000 values per second.

However, I really need to do this 10* faster as it's causing a bottleneck. Any ideas?

nathan
  • 5,402
  • 1
  • 22
  • 18

1 Answers1

1

Sounds like a Bloom Filter might be useful to screen out values that are definitely not in the list, assuming the majority of values in the input stream are not present in the list.

mattnewport
  • 13,728
  • 2
  • 35
  • 39