The following (C99 and newer) code wants to compute a square, restricted to the same number of bits as the original fixed-width type.
#include <stdint.h>
uint8_t sqr8( uint8_t x) { return x*x; }
uint16_t sqr16(uint16_t x) { return x*x; }
uint32_t sqr32(uint32_t x) { return x*x; }
uint64_t sqr64(uint64_t x) { return x*x; }
Problem is: depending on int size, some of the multiplications can be performed on arguments promoted to (signed) int, with result overflowing a (signed) int, thus undefined result as far as the standard is concerned; and conceivably wrong result, especially on (increasingly rare) machines not using two's complement.
If int
is 32-bit (resp. 16-bit, 64-bit, 80 or 128-bit), that occurs for sqr16
(resp. sqr8
, sqr32
, sqr64
) when x
is 0xFFFFF
(resp. 0xFF
, 0xFFFFFFFF
, 0xFFFFFFFFFFFFFFFF
). Neither of the 4 functions is formally portable under C99 !!
Does C11 or later, or some edition of C++, fix that unfortunate situation?
A simple, working solution is:
#include <stdint.h>
uint8_t sqr8( uint8_t x) { return 1u*x*x; }
uint16_t sqr16(uint16_t x) { return 1u*x*x; }
uint32_t sqr32(uint32_t x) { return 1u*x*x; }
uint64_t sqr64(uint64_t x) { return 1u*x*x; }
This is standards-conformant because 1u
is not promoted to int
and remains unsigned; thus the left multiplication, then the right one, are performed as unsigned, thus are well-defined to yield correct result in the necessary number of low-order bits; same for the final implicit cast to the result width.
Updated: As suggest in comment by Marc Glisse, I tried this variant with eight compilers (three versions of GCC for x86 starting with 3.1, MS C/C++ 19.00, Keil ARM compiler 5, two Cosmic compilers for ST7 variants, Microchip MCC18). They all generated the very same code as the original (with the optimizations I use in release mode for actual projects). However, compilers could conceivably generate worse code than the original; and I have several others of my embedded compilers to try, including some 68K and PowerPC ones.
What other options do we have, making a reasonable balance between likely better performance, readability, and simplicity?