While it would be possible to design floating-point hardware that whose execution speed for any particular operation would be independent of the values of the operands, it is generally advantageous to minimize the average-case time especially if that can be done without affecting the worst-case time. For example, even if a chip would normally require six cycles to perform a double-precision floating-point multiply, performance in many applications could be improved if, at the same time as the chip started the multiplication process, a separate circuit did the following:
Set R1 if first operand is NaN or second operand is +/- 1.0
Set R2 if second operand is NaN or first operand is +/- 1.0
Set Z if either operand is +/- 0.0
Set N if either operand is NaN
If (R1 or R2 or Z)
Set the body of the result, excluding sign, to the (first-op & R1) | (second-op & R2)
Set the sign of the result to (first-op & (R1 | !N)) ^ (second-op & (R2 | !N))
Skip the rest of the multiplication
Adding the above logic would cause floating-point multiplies by +/- 1.0, or +/- 0.0 to be performed in a sixth of the time required for multiplication not involving such constants. There are many scenarios where code accepts arbitrary scaling factors but is most often used with scaling factors of zero or one; some graphics applications, for example, might allow arbitrary scaling, rotation, and shear but be used most frequently with a scale factor of one, no rotation, and no shear. Expediting multiplication by zero and one, despite requiring less hardware than would be required to improve most multiplications by a cycle, could in many scenarios offer a more useful performance boost.