What is the math behind * 1233 >> 12 in this code counting decimal digits

Question

I am a bit confused how this short function from the C++ {fmt} library works.

inline std::uint32_t digits10_clz(std::uint32_t n) {
  std::uint32_t t = (32 - __builtin_clz(n | 1)) * 1233 >> 12;
  return t - (n < powers_of_10_u32[t]) + 1;
}

I understand the logic that you can approximate the log10 using log2(__builtin_clz) and that you need to adjust for exact value, but the multiplication is a mystery to me.

Sergey Kalinichenko · Accepted Answer · 2017-12-13T11:25:22.837

10

Recall the formula for changing the base of logarithm from b to d is

log_dx = log_bx / log_bd

In our case, b is 2 (binary), and d is 10 (decimal). Hence, you need to divide by log₂10, which is the same as multiplying by 1/log₂10, i.e by 0.30102999566.

Now recall that shifting by 12 is the same as dividing by 2¹², which is 4096. Dividing 1233 by 4096 yields 0.30102539062, which is a pretty good approximation for the denominator in the base change formula.

edited Dec 13 '17 at 11:25

answered Dec 13 '17 at 05:41

Sergey Kalinichenko

714,442
84
1,110
1,523

I have a further question. This trick of optimizing division is usually given by compiler and doesn't need to be in source code. Any possible reason it appears in source code this time? – Dec 13 '17 at 05:43
2

@NickyC It's hard to say - perhaps they copied this from [the source of bit tricks](https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10), and the author who wrote it there does not trust the compiler to optimize his code. I would replace the shift by division without hesitation. – Sergey Kalinichenko Dec 13 '17 at 05:50
@NickyC: because it's faster, and reliable. The other option to use floating point arithmetic is slower, and not reliable among platforms. The compiler doesn't do such optimizations, so this have to be in the source code. – geza Dec 13 '17 at 09:38
@geza: That's untrue. Compilers have been doing such optimizations for the last 25 years. It's utterly reliable. (Note that the division in question is `/4096`) – MSalters Dec 13 '17 at 10:59
@MSalters: okay, if we're talking about the `/4096` -> `>>12` transformation, then of course, this will be done by the compiler (for the unsigned case, they are exactly the same. For the signed case, they differ a little bit). I think I misunderstood Nicky's the question here... I was talking about the transformation of `*0.301...` -> `*1233>>12`. – geza Dec 13 '17 at 11:12
1

@MSalters: I misunderstood it, because there's a little typo in the answer. (I thought that Nicky's talking about this divison: "Hence, you need to divide by") – geza Dec 13 '17 at 11:18
1

@dasblinkenlight: There's a little typo in the answer: you either need to multiply by log10(2), or divide by log2(10). – geza Dec 13 '17 at 11:19
@geza Thanks for the comment! I switched `b` and `d` in the formula, and edited the rest of the answer to match. – Sergey Kalinichenko Dec 13 '17 at 11:30
2

The author of fmtlib here: as @dasblinkenlight correctly guessed the formula is indeed taken from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 as pointed out in the comment https://github.com/fmtlib/fmt/blob/0f98773164415ef70eef35daedab115935692524/fmt/format.h#L1054 – vitaut Dec 30 '17 at 02:20

What is the math behind * 1233 >> 12 in this code counting decimal digits

1 Answers1