2

I have multiple listeners threads reading a stream of messages (Kafka). Each message has an identifier. The consumers/stream guarantees at-least once consumption. At most of the time, the stream would provide the message exactly once. The count of messages to expect is known beforehand. When all messages are received, I want to shutdown all listener threads. The number of messages can be at most 50 million. What data structure is most suitable for this?

I was thinking of using std::set, std::map and using a mutex at each insertion of the thread. Can a single thread be actually faster in such a use-case? Is there something more optimal?

Mukul Gupta
  • 2,310
  • 3
  • 24
  • 39

1 Answers1

3

std::unordered_map would be better. But you should consider using something like HyperLogLog

Martin Rozkin
  • 375
  • 2
  • 6