Discrepancy between C mmap and numpy memmap

Question

I have a file containing a large number, N, of 32-bit floats. This file is created using numpys memmap function as follows:

mmoutput = np.memmap("filename", dtype='f4', mode='w+', offset=0, shape=N)
mmoutput[:] = my_floats
mmoutput.flush()

When I load these coefficients back in using numpy and sum them using:

mminput = np.memmap("filename", dtype="f4", mode='c', offset=0, shape=N)
mminput.sum()

I get the value 82435.047 (which is correct).

However, when I read in the floats using C's mmap as follows:

int fd = open("filename", O_RDONLY, 0);
float * coefs = (float*) mmap(NULL, sizeof(float) * N, PROT_READ, MAP_SHARED, fd, 0);
double sum = 0.0;
for (int i = 0; i < N; i++) sum += coefs[i];

The numbers sum to a different value: 82435.100.

Can someone help spot my error? Perhaps there is a difference between the way numpy writes its floats and C reads them?

Full disclosure

I was actually just calculating the sum of these numbers as a check they are the same. The real use of them is as coefficients in a bspline (implemented using the einspline library as shown, for example, here https://github.com/ahay/src/blob/master/user/cram/esc_slow2.c). When I evaluate the splines in python and C I get different values.

There's no summing code in your examples. – Ilja Everilä Jul 03 '17 at 12:29 — Ilja Everilä, Jul 03 '17 at 12:29
@JMzance Also show how you set `sum` to 0 before starting. – unwind Jul 03 '17 at 12:39 — unwind, Jul 03 '17 at 12:39

score 0 · Answer 1 · answered Jul 03 '17 at 12:42

0

I get the value 82435.047 (which is correct).

No it is not. You have summed 'a large number' of single precision floating point values, so it is unlikely to be accurate to more than four or five significant digits, especially if there is a large dynamic range in the values.

I suspect that numpy is performing a summation which improves the precision, such as converting to doubles. The source traces through several calls to umath.add.reduce then I got lost, so it is likely to be summing in double precision. Try converting to doubles or using Kahan summation to get a result which does not lose the precision - Kahan will give a more accurate result than any reduction of addition as it has correction for the dynamic range effect.

answered Jul 03 '17 at 12:42

Pete Kirkham

48,893
5
92
171

The sum is improved when I use double sum=0.0 rather than float sum. However the values are still different albeit by a smaller amount. And when I assign the values to the bspline and evaluate it then results are different – JMzance Jul 03 '17 at 13:10
@JMzance if you want exactly the same results, use the same implementation strategy and optimisations that numpy does. But if your application requires six significant figures for the sum, then you shouldn't have your data in floats to begin with, as you will lose precision with each operation. So is your goal to ensure numpy and your simple C code are behaving exactly the same way, or trying to get an answer to 'what is the sum of these values'? – Pete Kirkham Jul 03 '17 at 13:28
The former - I need to pass in the same single precision floats to einspline to use as coefficients. I don't mind that they sum to different things so long as they are read exactly the same from the file – JMzance Jul 03 '17 at 13:35

Discrepancy between C mmap and numpy memmap

Full disclosure

1 Answers1