0

Is there any way to improve the performance of this function? As I have to call it ~10-100M times for 1 calculation it's slowing down my program and most time has been lost in this line:

myHand_param.FinalHandVector.push_back((1 << static_cast<int>(std::ceil((i - 3.0) / 4.0))));

I know that the vector capacity should never exceed 5 but I don't know how to skip the boundary checking, tried to work with int* instead of vector but it didn't help as well.

    void CalculateKickers(Hand& myHand_param, int kickersNeeded)
{
    int countKickers{ 0 };

    for(int i = bits-1; i > 0 ; i--)
    {
        if (static_cast<std::bitset<bits>>(myHand_param.handMask)[i] == 1)
        {
            if (static_cast<std::bitset<1>>((static_cast<std::bitset<bits>>(myHand_param.FinalHandUsedCardFacesMask) >> i).to_ullong())== 0)
            {
                ++countKickers;
                myHand_param.FinalHandVector.push_back((1 << static_cast<int>(std::ceil((i - 3.0) / 4.0))));

                if (countKickers == kickersNeeded)
                {
                    i = 0;
                }
            }
        }   
    }
}
user438383
  • 5,716
  • 8
  • 28
  • 43
  • 1
    a) does calling `myHand_param.FinalHandVector.reserve(5);` (before the loop) help at all? b) are you compiling with optimizations enabled? – Borgleader Jul 06 '22 at 04:25
  • If you expect to push a bunch of values onto a vector, consider calling `reserve` up-front to avoid reallocations. Also, describe how you are benchmarking this code. Did you compile with full optimizations? Debug / unoptimized builds can run very slowly, especially with vectors. It's very unlikely that the vector size-checking alone is causing a bottleneck, just looking at how much other branching you already do. It _might_ be the reallocations, but I suspect the problem could be elsewhere. – paddy Jul 06 '22 at 04:25
  • 2
    A likely suspect is `std::ceil((i - 3.0) / 4.0)` -- a bunch of `double` arithmetic and a call to the math library. This line of code might show up as a hot path during instrumentation and lead you to believe it's the `push_back` at fault, whereas it could very well be the calculation. Consider precalculating these values in a table somewhere or at the very least use only integer arithmetic or bit manipulation to compute this. – paddy Jul 06 '22 at 04:27
  • 1
    another option, is if you can spare the extra memory, i would just store a std::array + size, instead of a vector to avoid the dynamic allocations & extra indirections. – Borgleader Jul 06 '22 at 04:37
  • 1
    What bounds checking do you mean here? In optimized mode tgere should be no bounds checking in push_back... – user1782685 Jul 06 '22 at 04:37
  • 1
    Calls to std::ceil can be optimized as you only use positive integer arithmetic there. i / 4 is probably equivalent operation. – user1782685 Jul 06 '22 at 04:38
  • 1
    Those `bitset` conversions look unnecessarily complicated. – molbdnilo Jul 06 '22 at 07:17

0 Answers0