1

I have a list of lists of floats, like this:

u = [[1.2, 1.534, 23.5, ...], [0.2, 11.5, 3.3223, ...], ...]

Using Python to calculate a new list (height and width are the lists dimensions, u2 is a list of lists of floats set to 0.0):

for time in xrange(start, stop):
    for i in xrange(1,height-1):
        for j in xrange(1, width-1):
            u2[i][j] = u[i][j-1] + u[i-1][j] - time * (u[i][j+1] / u[i+1][j])
    u = deepcopy(u2)

As expected, this produces a new list of lists of floats.

However, transferring this to Numpy, with a simple:

un = array(u)

Then using the same kind of loop (u2 being an array of zeroes this time):

for time in xrange(start, stop):
    for i in xrange(1,height-1):
        for j in xrange(1, width-1):
            u2[i][j] = un[i][j-1] + un[i-1][j] - time * (un[i][j+1] / un[i+1][j])
    un = u2

... produces equal results as the Python implementation as long as height, width and the timerange are all small, but differing results as these variables are set higher and higher.

  • Is there a way to prevent this build-up of float-inaccuracy?

(This is not real code, just me fiddling around to understand how numbers are treated in Python and Numpy, so any suggestions regarding vectorization or other Numpy-efficiency stuff is off-topic)

darthur
  • 43
  • 5
  • 3
    Possible duplicate of [Floating point math in python / numpy not reproducible across machines](http://stackoverflow.com/questions/30065437/floating-point-math-in-python-numpy-not-reproducible-across-machines) – 301_Moved_Permanently Oct 26 '15 at 09:26
  • Thanks. But is the difference between floats in Numpy and pure Python equal to the difference of floats on other platforms? What is the best practice to avoid a build-up of inaccuracy between results from calculations in pure Python vs results from calculations in Numpy? – darthur Oct 26 '15 at 10:12
  • 1
    can you please specify python/numpy version and u2.dtype ? – B. M. Oct 26 '15 at 10:25
  • 2.7.10 and numpy 1.10.1. u2.dtype is float64. – darthur Oct 26 '15 at 11:17
  • @MathiasEttinger I'm not convinced that this is a duplicate - that question is about differences in floating point results between the same code run on different machines, whereas the OP is asking about two specific Python and numpy implementations that are (presumably?) being executed on the same machine. – ali_m Oct 26 '15 at 11:58

1 Answers1

2

At first glance the problem seems to be un = u2. This creates a reference to u2 rather than a copy, so you are directly modifying u inside your inner loop. This will give you different results to the pure Python version since the value at u2[i][j] depends on u[i][j-1] and u[i-1][j].

Try un = u2.copy() to force a copy instead.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • Still different. The inaccuracies exists even when using a vectorized calculation. – darthur Oct 26 '15 at 09:49
  • Can you turn your code into a reproducible example? – ali_m Oct 26 '15 at 09:54
  • Will post when I'm back in front of my computer. Also, is using u2 = u2[i][j] + .... in the loop body alright (or as bad as un = u2 after the loop)? – darthur Oct 26 '15 at 10:16
  • I assume you mean `u2[i][j] = u2[i][j-1] + ...`? Yes, that would be "as bad" as assigning `un = u2` outside the loop. In your Python code, for each iteration over `time` the updated values in `u2` depend only on the "untainted" values from the previous `time` step that are stored separately in `u`. In your numpy code you effectively have a single array that is constantly being updated inside your inner loops. – ali_m Oct 26 '15 at 12:20
  • Like ali_m said, you must put `un = u2.copy()` to have the same calculation. Moreover, `u[i][j+1] / u[i+1][j]` puts `nan` values in your data after the second loop. But `np.nan==np.nan` is evaluated `False` , so `un` and `u` seems to be different with the same data. It's difficult to say more without concrete data and measure of inaccuracy. – B. M. Oct 26 '15 at 13:23
  • Thanks. The equation provided was made up on the spot, just for this thread. I've tried many differnt equations. The core of the problem is this: doing a floating point calculation with many "repeats" produces different results in Numpy vs. Python. Looking at around 10% difference when t = 10000 and height and width are 100. – darthur Oct 26 '15 at 13:27