0

Specifically, are these cumulative product functions in pandas and numpy implemented in a robust way to handle underflow when multiplying lots of small numbers together? For example, are they using the log-sum-exp trick?

Thanks.

WillZ
  • 3,775
  • 5
  • 30
  • 38
  • 1
    You can check this pretty easily. For example, set `x = np.array([1e-5, 1e-30, 1e-100, 1e-200, 1e50, 1e150])`, and compare `np.cumprod(x)` with `np.exp(np.cumsum(np.log(x)))`. – Warren Weckesser Apr 20 '17 at 01:08
  • Yes I did something similar but wasn't sure where the theoretical bounds are, or if it was just hitting the limit of my platoform (machine/os/etc). – WillZ Apr 21 '17 at 10:31

1 Answers1

1

Unfortunately, no. @warren-weckesser 's comment shows this to not work.

np.array([1e-5, 1e-30, 1e-100, 1e-200, 1e50, 1e150]).cumprod()

# returns
array([1.0e-005, 1.0e-035, 1.0e-135, 0.0e+000, 0.0e+000, 0.0e+000])

The reason is that numpy floats support a smallest positive value of 2**-1022, or about 2.225e-308. Once your calculation becomes smaller than that, it is dropped to zero, which is what we see in the above output. The same is true for pandas.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
James
  • 32,991
  • 4
  • 47
  • 70