So I'm working on an embedded project that needs to compute floating point numbers. Obviously there's various ways to estimate the output and reduce compute cycles. My question is, how expensive is a floating point compare vs an integer compare (relatively speaking, not exact cycles)? One of the operation can probably be optimized this way, but I am wondering if it is worth the effort. The chip is a cortex M0 (no floating point hardware). All floating point is done through software.
-
1should be significantly different, hundreds of cycles? When you tried it what did you get? – old_timer Sep 16 '15 at 20:47
-
On many such projects you can avoid `float` by using fixed-point integers. Using floating point is often just because of lazy programmers trying to avoid to write some extra functions at the cost of float-bloat. Are you really sure you need floating point? – too honest for this site Sep 16 '15 at 21:25
-
Why are you fixating on "compare" operations? In most cases other floating-point operations will dominate. – Clifford Sep 16 '15 at 22:21
-
@dwelch : I doubt "hundreds" of cycles for the compare operation; inspection of the mantissa and quotient magnitudes won't take that many instructions. However it is an odd question - it seems likely that other aspects of using floating point will be far more significant. Of course an comparison in an iteration may accumulate many additional cycles, but teh question implied a single operation. – Clifford Sep 16 '15 at 22:26
-
You might do better to ask a question about optimising a specific algorithm rather then "sweating the small stuff". – Clifford Sep 16 '15 at 22:30
-
Based on working on FPU-less ARM chips some 10 years ago, whose capabilities were similar to a Cortex M0, single-precision addition and subtraction should take on the order of 35 cycles for a well-written FP emulation library, and as comparison is closely related to subtraction, it should take somewhat less than this, as it does not have to produce a floating-point result. Note that if both operands are finite,positive values, single-precision comparison can be replaced with 32-bit integer comparison by re-interpreting the data as `uint32_t`. – njuffa Sep 16 '15 at 22:31
-
@njuffa : That was clearly an answer, and should not be posted as a comment. I'd upvote it if it were posted appropriately. – Clifford Sep 17 '15 at 06:47
-
@Clifford Thanks for the endorsement, I have expanded my comment into an answer. – njuffa Sep 17 '15 at 07:51
-
I believe the IEEE floating point representation is designed such that the compare instruction can be used on ints or floats. Would this mean that there is no penalty for comparing floating points? The only penalty, and potentially expensive portion would be any conversions to and from and int to a float. Anyone care to comment? – jliu83 Sep 25 '15 at 04:46
3 Answers
The simplest way to determine the cost of floating-point comparison reliably would be to time it. If that is not possible for some reason, one might estimate it.
I worked with FPU-less ARM processors around 2003, and wrote my own highly optimized single-precision floating-point for these. While I have no longer access to that code, I found its performance very similar to published performance numbers in this paper by Iordache and Tang for a floating-point emulation library on XScale.
This shows single-precision addition as executing in 35 cycles, and the time for subtraction would be essentially identical. Since comparison is a simplified form of subtraction, in which no floating-point result needs to be computed, comparisons would therefore be somewhat cheaper, putting an upper bound on the cost of comparisons.
With floating-point emulation, single-precision operands are represented by 32-bit integers and stored as such in general-purpose registers. If both operands are finite positive operands, they can be directly compared by integer comparison. Basically, this uses integer comparison on the binary32
operands re-interpreted as int32
. This gives a lower bound on the cost. This answer shows how this approach can be generalized to pairs of non-exceptional operands.
-
How would the time for repeated floating-pint additions compare if numbers were stored as a 32 bit not-necessarily-normalized mantissa plus a 32-bit sign-and-exponent word? I've long thought there should have been an "extended float" to provide the advantages that extended double would have provided if ANSI had included separate types for "result of intermediate operations on double" and "80-bit extended float". – supercat Sep 17 '15 at 22:59
Haven't tested on Cortex M0, but comparing two floating point numbers, at worst, works roughly as follows:
bool floatLess(float a, float b)
{
uint32_t ai = reinterpret_cast<uint32_t const&>(a);
uint32_t bi = reinterpret_cast<uint32_t const&>(b);
if((ai^bi) & 0x80000000) // different signs
{
return ai>bi; // smaller one has sign bit set
}
else
{
return ai<bi; // lexicographic compare of exponent,mantissa
}
}
There's a bit more involved to take care of edge cases like NaN and -0 versus +0, but IEEE 754 numbers were designed to be easily comparable in an integer context. It's unlikely that there's a significant cycle difference.

- 40,271
- 12
- 71
- 104
-
the convert to int in this case is quite expensive, esp since it might not fit and a dedicated compare that rips apart the sign and exponent and mantissa might be faster. but the poster has not explained how they can compare an int operation to a float without talking about the rest of the code is it always float and they want to float to int to compare, which means the float to int is the expensive part not the int compare. and if they are about performance at all, then dont use float on a cortex-m0, you can have a lot of extra math and still win on performance. – old_timer Sep 17 '15 at 13:48
-
@dwelch : A `reinterpret_cast` is cost free, however it is then assigned to `uint32_t` causing an implicit conversion. I wonder whether than is an error and `reinterpret_cast
` was intended? However the code I believe was intended to be illustrative of the process, rather than an actual implementation - I'd treat it as pseudo-code in this context. – Clifford Sep 17 '15 at 16:31 -
ahh, I thought you were converting from int to float, I see this solves that problem that everyone uses a union incorrectly for (even though it generally works). thanks I get it now... – old_timer Sep 17 '15 at 17:53
-
wouldn't casting to a signed integer allow removing the if statement? – EmbeddedSoftwareEngineer Nov 22 '22 at 11:12
Compare operations >
,<
,>=
and <=
(avoid ==
for floating-point) are likely to be insignificant - especially for single-precision (though not as cheap as an integer compare). The simple test is to build some test code and look at the assembler output of the compiler or disassembly in the debugger.
However the real impact of using software floating point is in arithmetic operations and even more so for trig, log and sqrt functions.
There is seldom any need for floating-point in most applications, and a software fixed-point implementation will be much faster and more than adequate in many applications. I have measured the fixed point code presented in this article by Anthony Williams at 5 times faster then single-precision floating point on ARM9, and am using it on Cortex-M3. In this library the compare operators are integer operations (albeit 64 bit, so not at cheap as an int32). In a specific algorithm, you may not even need 64 bit fixed-point.

- 88,407
- 13
- 85
- 165
-
Do we know if the OP is including the conversion from float to int before doing the int compare, or is there some other magic that happens so the only cycle difference is in the fixed vs float compare without any other code having soft fpu vs fixed point. In general taking extra effort to not use float will often pay off greatly, but we really dont have enough info to really answer this question, or even really understand it. – old_timer Sep 16 '15 at 23:16
-
@dwelch : I agree. I answered the question directly as far as possible, but suggest that it is the wrong question. The time taken waiting for an answer here would be better spent writing and evaluating implementations or experiments on the target (or simulator) to get a real answer rather than our best guess. – Clifford Sep 17 '15 at 06:52