2

I need a faster membership lookup for some legacy packet processing code which needs to identify if a packet with a particular ID is in a particular list.

The list is only updated every few seconds while the packet matching happens very very often, so lookup performance is more important than insertion/deletion etc.

General Flow:

forall(special_PacketIDs)
{
  pktIdSet.insert(theSpecialPktId)
}

while (1)
{
  pkt = readPkt();
  pktID = getPktIdOfPkt(pkt);

  if ( aSpecialPkt(pktID) )
    doSomething();
}

And right now, aSpecialPkt(pktId) is defined as:

bool PktProcessor::aSpecialPkt(unsigned short pid)
{
  return pktPidSet.find(pid) != pktPidSet.end();
}

gprof reports a lot of time spent in the std::set::find()

The range of pktId is only 8192 possible values. Allocate a linear array would be much faster at the expense of memory, something like:

class LinearSet
{
public:
  void insert(pid) { mPktIdSet[pid] = true; }
  bool elementExists(pid)  { return mPktIdSet[pid]; }
private:
  bool mPktIdSet[8192];
}

My question is whether there is a more "C++" way of doing this while maintaining top performance?

Danny
  • 2,482
  • 3
  • 34
  • 48
  • 1
    Did you try `std::unordered_set` or just `std::vector`? – 5gon12eder Feb 26 '16 at 14:36
  • Right now the set is defined as std::set. But seems unordered_set is only available in C++11, which this old code won't work without a lot of work. – Danny Feb 26 '16 at 14:41
  • @Danny: The complexity for find of a std::set is logarithmic and for a std::unordered_set constant on average (worst case linear) –  Feb 26 '16 at 14:47
  • @DieterLücking You are right. I think I should stop posting stupid things around here....I deleted my comment. – Simon Kraemer Feb 26 '16 at 15:28

2 Answers2

8

If you know that there are precisely 8192 possibilities, your best bet is probably std::bitset<8192>, which will use a kilobyte and is very cache-friendly.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    Excellent choice. Just adding, for future reference, that if the size is much much much larger, and some falses are OK, then a [bloom filter](https://github.com/mavam/libbf) might be appropriate. – Ami Tavory Feb 26 '16 at 14:53
  • That's perfect. Thanks! – Danny Feb 26 '16 at 15:14
  • 1
    @AmiTavory Very nice suggestion - a good use-case for the elusive bloom filter. :) – erip Feb 26 '16 at 15:57
1

std::bitset<8192> is a good choice, but it really depends on your platform as well as the number of special packet IDs. See this question: Choosing between set<int> vs. vector<bool> vs. vector<boolean_t> to use as a bitmap (bitset / bit array)

Community
  • 1
  • 1
Daniel
  • 66
  • 5