Compute sqrt(SIZE_MAX+1) using only integer constant expressions, catering for weird ABIs

Question

OpenBSD's C library has an extension called reallocarray(3) which does realloc(array, size*nmemb) without blowing up if the multiplication overflows. The implementation contains this fragment:

/*
 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

Over on Programmers.SE a modified version of that calculation got dinged for technical incorrectness. 4 should obviously be CHAR_BIT/2, but that's not the only problem. Suppose an unusual ABI in which size_t has padding bits. (This is not ridiculously implausible: consider a microcontroller with 32-bit registers but a 24-bit address space.) Then SIZE_MAX is less than 1 << (sizeof(size_t)*CHAR_BIT) [in infinite-precision arithmetic] and the calculation is wrong.

So, question: Can you compute floor(sqrt(SIZE_MAX+1)) using only C99 integer-constant-expression arithmetic, making no assumptions whatsoever about the ABI other than what C99 requires plus what you can learn from <limits.h>? Note that SIZE_MAX may equal UINTMAX_MAX, i.e. there may not be any type that can represent SIZE_MAX+1 without overflow.

EDIT: I think SIZE_MAX is required to be 2ⁿ − 1 for some positive integer n, but is not necessarily of the form 2²ⁿ − 1 — consider S/390, one of whose ABIs has a 31-bit address space. Therefore: If sqrt(SIZE_MAX+1) is not an integer, the desired result (given how this constant is used) is floor() of the true value.

You could use a hardcoded number of iterations of the Babylonian (Heron's) method http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method, however you would have to make assumptions about the maximum possible word size. — Peter G., Sep 02 '14 at 14:45
What about some really plausible assumptions, like `MUL_NO_OVERFLOW` being a power of two? — zch, Sep 02 '14 at 14:50
May one safely assume that `SIZE_MAX+1` is a power of two, and `2¹⁶ ≤ SIZE_MAX` ≤ 2⁶⁴? If so, that would mean there were at most 49 possible values, and one could use the `?:` operator to select one. Using some nested macros, an appropriate expression could be expressed reasonably concisely in source code. — supercat, Sep 02 '14 at 15:07
@zch I *think* unsigned integer types are required to have their `_MAX` be of the form 2^N-1 for some N, but I don't have my copy of the standard on this computer. — zwol, Sep 02 '14 at 15:21
@zch Mm, yes. Again I don't remember exactly what the standard says, but I know there have been systems where the abstractly correct choice for `SIZE_MAX` was 2^31-1. — zwol, Sep 02 '14 at 15:47
Rough idea: `#define M SIZE_MAX #define S2 1.41421... #define MUL_NO_OVERFLOW (size_t) ((((M&1)?S2:1) * ((M&2)?2.0:1) * ((M&4)?2.0*S2:1) * ((M&16)?4.0:1) ... ) + 0.0001)`. — chux - Reinstate Monica, Sep 02 '14 at 16:07
@chux Floating point arithmetic cannot be used in an *integer constant* expression. — zwol, Sep 02 '14 at 16:28
Why can't floating point arithmetic be used such as `#define X ((int) (1.5*1.5))` to form `int y[X];`? — chux - Reinstate Monica, Sep 02 '14 at 17:01
@chux Because the standard says you can't. (Or, if you like, because the standard does not require the implementation to be able to evaluate floating-point expressions at compile time.) — zwol, Sep 02 '14 at 17:47
Would it be a cheat to assume `SIZE_MAX == power(2,n)-1` and then `#if SIZE_MAX == 65535 #define MUL_NO_OVERFLOW 256 #else if SIZE_MAX == 131072 #define MUL_NO_OVERFLOW 362 #else if SIZE_MAX == 262144 #define MUL_NO_OVERFLOW 512 ... #endif`? (Have a case for each (16 <= n <= 64 or 128) No points for elegance, but appears to get the job done. — chux - Reinstate Monica, Sep 02 '14 at 18:08
I'm not sure if you need this test altogether, but you might get even better results with `(nmemb >= SIZE_MAX/256 || size >= 256)`, bypassing the whole square root problem. — zch, Sep 03 '14 at 10:11

pablo1977 · Answer 1 · 2014-09-02T16:15:03.980

The constant SIZE_MAX is non-negative and has type size_t. For shortness, I will define:

  #define S SIZE_MAX

The mathematical value S+1 is or can be, as you pointed out, out of range for any integer type.
I will write S1 for the mathemtaical value of S+1.
If we consider the logarithm (in base 2, if you want) of S1, then we have:

 logarithm(sqrt(S1)) == (1.0/2.0) logarithm(S1)

On the other hand, in almost-for-sure every realistic situation, we will have that S is represented as a binary number having only 1 bits. The number b of this bits is, in general, the number CHAR_BIT multiplied by a power of two, multiplied by CHAR_BIT: 16, 32, 64, 128... I will denote the exponent of this power by p. Thus, for CHAR_BIT == 8, we have:

16 == CHAR_BIT * 2 ----> p == 1
32 == CHAR_BIT * 4 ----> p == 2
64 == CHAR_BIT * 8 ----> p == 3

Now we have:

logarithm(S1) == b == CHAR_BIT * (2 ** p)   (I am denoting with ** to the "power math. operator").

logarithm(sqrt(S1)) == logaritm(S1) / 2.0 == CHAR_BIT * (2 ** p) / 2.0 == CHAR_BIT * (2 ** (p - 1))

By assuming or knowing that every bit in size_t is used only to represent the bits of an integer number, we have this equality, por some (unknown) value of p:

sizeof(size_t) == b == CHAR_BIT * (2 ** p)

We can assume, for the state-of-the-art in 2014, that the value of p <= 5, say (you can grow this magic number 5 to bigger values in what follows).

Now, consider the following expression, intended to "search and found" the value of b, under the assumption that p <= 5:

#define S_1 ((size_t)1ULL)
#define b (sizeof(size_t))
#define bitexpr(p) ((size_t)(CHAR_BIT * (S_1 << (p))))
#define expr(p) ((size_t) (S_1 << (p)))
#define exp2_expr_1(p) ((size_t)(S_1 << bitexpr(p-1)))
// SRSM() stands for: Square Root SizeMax
#define SRSM  ( \
    (expr(1)==b)? exp2_expr_1(1) :             \
        (expr(2)==b)? exp2_expr_1(2) :         \
            (expr(3)==b)? exp2_expr_1(3) :     \
              (expr(4)==b)? exp2_expr_1(4) :   \
                (expr(5)==b)? exp2_expr_1(5) : \
                   (size_t)0  /* Error! */     \
    ) /* end-of-macro*/

The macro SRSM brings, actually, the square root of S+1, but I suppose you can figure out what to do with this number.

What is important here is that the square root of SIZE_MAX can be obtained by using purely integer constant expressions.

If you want, the "magic" number 5 can be changed by another one.

A more general approach, intended to solve an arbitrary situation, on any possible machine fitting the standard, it would be more complicated.
The method used in this post is independent of the value that has CHAR_BIT, but it uses that the number of bytes is a power of 2.

EDITED: I changed a little the method for "search", starting by 1 and then growing up, to avoid possible "false" matches with << operator and big numbers (one never knows...). Now, the first match is, for sure, the correct.

This seems like it's going in the right direction, but I think you may have missed that your value `b` is *not* necessarily `CHAR_BIT*p` for some power-of-two p, or even some *integer* p. A `SIZE_MAX` of 524287 (2^19 - 1) is probably legit, and I'm only saying "probably" because I don't have a copy of C99 on this computer. — zwol, Sep 02 '14 at 15:57
The value of `b` is a logarithm, it's the number of bits, some, it's CHAR_BIT * (something integer). On the other hand, I know the generality of C99 standard. By I am assuming what one would have in practical cases. In your case, the number of bits is 19. Do you really have this case in your implementation for the value of `SIZE_MAX`? — pablo1977, Sep 02 '14 at 16:02
Anyway, probably you can use the method I wrote as a first quick approximation, and then advance a little more, bit-by-bit, until you reach the exact number of bits, say 19. If `SIZE_MAX+1` were not a power of 2, it would be a very exotic implementation. — pablo1977, Sep 02 '14 at 16:07
In fact, I have no such implementation -- the original code works fine for all systems I care about in practice -- but this is explicitly a "can you do this at all, considering the full generality of what is allowed by the standard" question. — zwol, Sep 02 '14 at 16:27
Well, the main problem is to obtain some upper bound of the quantity you are looking for, which can be achieved by using my method (maybe with some little modification). Then, the result can be improved by doing some "binary search" in the powers of 2. Perhaps the consideration of CHAR_BIT would have been withdrawn at all. — pablo1977, Sep 02 '14 at 19:30

score 1 · Answer 2 · answered Oct 16 '17 at 21:48

An alternative to solve the sqrt(SIZE_MAX+1) is to look at the higher level goal shown here and below.

/* s1*s2 <= SIZE_MAX if s1 < K and s2 < K, where K = sqrt(SIZE_MAX+1) */
const size_t MUL_NO_OVERFLOW = ((size_t)1) << (sizeof(size_t) * 4);
if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
    nmemb > 0 && SIZE_MAX / nmemb < size)
  abort();

A simple overflow detection could use the following. This unfortunately performs division - an expansive test, yet otherwise "works". No MUL_NO_OVERFLOW needed.

if (nmemb > 0 && (SIZE_MAX / nmemb < size))
  abort();

Instead of dividing, OP's code does 2 simple compares against MUL_NO_OVERFLOW to eliminate most contenders.

if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) && ...

Yet the pre-processor computation of MUL_NO_OVERFLOW is challenging , hence this post.

The alternative is to use an easy to compute MUL_NO_OVERFLOW_ALT to eliminate many size, nmemb pairs from the division test.

// good for up to SIZE_MAX + 1 == 2**128
#define SQN(x) ( !!(x) )
#define SQ0(x) ( (x) >= 0x2u ? 1 + SQN((x)/0x2u) : SQN(x) )
#define SQ1(x) ( (x) >= 0x4u ? 2 + SQ0((x)/0x4u) : SQ0(x) )
#define SQ2(x) ( (x) >= 0x10u ? 4 + SQ1((x)/0x10u) : SQ1(x) )
#define SQ3(x) ( (x) >= 0x100u ? 8 + SQ2((x)/0x100u) : SQ2(x) )
#define SQ4(x) ( (x) >= 0x10000u ? 16 + SQ3((x)/0x10000u) : SQ3(x) )
#define SQ5(x) ( (x) >= 0x100000000u ? 32 + SQ4((x)/0x100000000u) : SQ4(x) )
#define SQ6(x) ( (x)/0x100000000u >= 0x100000000u ? 64 + SQ5((x)/0x1000000000000000u/16) : SQ5(x) )
#define MUL_NO_OVERFLOW_ALT (((size_t)1) << (SQ6(SIZE_MAX)/2))

printf("%zx %zx\n", SIZE_MAX, MUL_NO_OVERFLOW_ALT);
// Output
// ffffffff 10000

In OP's case, even if the best (size_t)sqrt(SIZE_MAX + 1.0) was possible at compile time, the division test would still be needed to pass legitimate pairs of size, nmemb like 4, SIZE_MAX/100. So for this task, the best ISQRT(SIZE_MAX + 1.0) is not needed, just something near and not greater than.

With this code the best value is calculated when SIZE_MAX + 1 is an even power of 2 - a common case.

score 0 · Answer 3 · answered Sep 03 '14 at 22:30

The following works but needs clean-up.

Assumes:
1) SIZE_MAX + 1 is some power of 2, odd or even.
2) unsigned long long range at least size_t.
3) sizeof (SIZE_MAX) <= 128 bits.

Notes:
1) SIZE_MAX must be at least 65535.
2) Way too many ()
3) Needs more testing.

Method:
Find power of 2 of SIZE_MAX/4 + 1 via nested defines. Needs another level for each double of bits. At each bit width level 32, 64, 128, ... test upper bits of SIZE_MAX. Once the maximum power-of-2 bit width is found, recurse down to get bit with of SIZE_MAX.

Form the sqrt(SIZE_MAX + 1) from (1 << Bit_Width(SIZE_MAX/4 + 1)/2)*2. Multiply by a scaled sqrt(2) if bit width was odd.

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>

int main(void) {
  // Swap 1 for various powers of 2 like 256, 512. etc. to test
#define M  SIZE_MAX/1
  printf("M %zX\n", (size_t) M);
#define N (M/4 + 1)
  printf("N/4 + 1 %zX\n", (size_t) N);

// Obviously this line can be simplified.
#define M0(x)  ((x & ~0x0U) ? 0 : 0)

#define M1(x)  ((x & ~0x1U) ? 1 + M0(x >> 2) : M0(x))

#define M2(x)  ((x & ~0x3U) ? 2 + M1(x >> 2) : M1(x))

#define M3(x)  ((x & ~0xFU) ? 4 + M2(x >> 4) : M2(x))

#define M4(x)  ((x & ~0xFFU) ? 8 + M3(x >> 8) : M3(x))

#define M5(x)  ((x & ~0xFFFFLLU) ? 16 + M4(x >> 16) : M4(x))

#if (M & ~0xFFFFFFFFLLU)
#define M6(x)  ((x & ~0xFFFFFFFFLLU) ? 32 + M5(x >> 32) : M5(x))
#if (M & ~0xFFFFFFFFFFFFFFFFLLU)
#define MN(x)  ((x & ~0xFFFFFFFFFFFFFFFFLLU) ? 64 + M6(x >> 64) : M6(x))
#define MSQ2(odd, x) ((odd&1) ? (x*13043817825332782212llu)>>63 : x)
#else
#define MN(x)  M6(x)
#define MSQ2(odd, x) ((odd&1) ? (x*3037000499llu)>>31 : x)
#endif
#else
#define MN(x)  M5(x)
#define MSQ2(odd, x) ((odd&1) ? (x*46340llu)>>15 : x)
#endif

  printf("MN(N) %d\n", MN(N));
  printf("MN(M) %d\n", MN(M));

#define MUL_NO_OVERFLOW   ((size_t) MSQ2(MN(N), ((((size_t)1) << (MN(N)/2))*2) ))

  printf("MUL_NO_OVERFLOW %zu\n", MUL_NO_OVERFLOW);
  return 0;
}

Looks too complicated - will see about simplifying. GTG

Compute sqrt(SIZE_MAX+1) using only integer constant expressions, catering for weird ABIs

3 Answers3