I am searching for a very CPU efficient way to compute floating point modulus one (including negative values) in C. I am using it for normalized phase reduction (wrapping, i.e 7.6 -> 0.6, 0.2->0.2, -1.1 -> 0.9 and so on).
From what I understood, fmod() but also floor() are usually very inefficient. I don't need the function to be strict i.e to take into account nan or inf since I am responsible to pass valid values.
I have always been using
m = x - (float)(int)x;
m += (float)(m<0.f);
// branchless form to add one if m is negative but not zero
which from benchmarks is in general much more efficient than fmod() or using floor() in place of the int cast, but I was wondering if an even more efficient way existed, perhaps based on bit manipulations...
I am coding on 64 bits intel cpus with gcc, but for my purposes I am using 32 bits single precision floats.
I apologize if the same has been addressed somewhere else, but from my search I could not find anything about this specific topic.
EDIT: sorry I realized there was a subtle error in the originally posted code so I had to fix it. 1 must be added if the result (m) is negative, not if x was negative
EDIT2: actually, after benchmarking the same function using x-floor(x) instead of x-(float)(int)x on GCC 12 and all math optimizations on, I must say the former is faster, since GCC is evidently smart enough to replace the floor() inline function with very efficient code (this at least is true on my intel i7). This however may not always be the case with every cpu and compiler, since in other cases both floor() and fmod() are by personal experience very inefficient. Therefore, my quest for a bit manipulation or comparable trick which may result much faster and with every compiler and architecture still applies