Summary
As I see it, the problem that you noticed is a consequence of the fundamental limitations of floating point arithmetic; in particular
- the order of floating-point calculations (in the given case: summation) matters, and
- choosing different algorithms that choose different orders thus may lead to slightly different results when given the exact same inputs – neither result of which should be considered "more correct" than the other.
Why the order of summation matters
Consider the example from the code below, in which we want to sum all elements of an array a
. We will do this in two different orders: sum_1
sums all values from left to right, sum_2
progressively sums all neighboring pairs of values until only one value remains. Mathematically, this should not make a difference, as summation is an associative operation on real numbers. It is not associative on the floating point numbers on a computer, however, as we see from the output of the last line:
import numpy as np
eps_half = np.finfo(float).eps / 2
a = np.asarray([0.5, 0.5, eps_half, eps_half], dtype=float)
sum_1 = ((a[0] + a[1]) + a[2]) + a[3]
sum_2 = (a[0] + a[1]) + (a[2] + a[3])
print(sum_1, "versus", sum_2)
# >>> 1.0 versus 1.0000000000000002
What is going on here? Notice the use of eps
, which is defined as the difference between 1.0 and the next smallest representable float larger than 1.0. For the float
type, at least on my system, the value is ca. 2e-16
, i.e. 0.0000000000000002. The definition implies that, if we add a value smaller than eps
to 1.0, the result will still be 1.0!
In the constructed example from above, the latter happens in the calculation for sum_1
but not for sum_2
, as we can see from their calculation steps:
- steps for
sum_1
:
0.5 + 0.5 → 1.0
add a[0] and a[1]: works as expected
1.0 + eps/2 → 1.0
add a[2] to previous sum: eps/2 is too small for the result to be represented as its own float value larger than 1.0, so the sum remains 1.0
1.0 + eps/2 → 1.0
add a[3] to previous sum: same problem as in step 2
- steps for
sum_2
:
0.5 + 0.5 → 1.0
add a[0] and a[1]: works as expected
eps/2 + eps/2 → eps
add a[2] and a[3]: works as expected
1.0 + eps → 1.0000000000000002
add results from steps 1 and 2: works as expected, because eps is large enough for the result to be represented as its own float value larger than 1.0
What this has to do with matrix-vector products
You might ask yourself what your observation has to do with my example from above. Here is what is most likely going on:
For each element in the result vector of a matrix-vector product, as is calculated in your code, essentially we calculate a dot product of two vectors: (1) the corresponding row of the given matrix and (2) the given vector. The dot product, by definition, multiplies the corresponding elements of the given vectors, then sums these intermediate results. Now the only option that may give us different results in calculating a matrix-vector product from identical input values may lie in this summation; or rather, its order, as the last step of each dot product. And here is the thing:
- We see cancellation effects as in my example above with floating point summations all the time; this is not a particular problem of summing 1.0 and
eps
, but a general problem of summing large and small floating point values.
- In array summation, no summation order is "more correct" or "less correct" than any other; in fact, the closeness of the summation result to the "true result" (i.e. to the result calculated with infinite precision) depends on the order of the values in the array. (Some summation orders in general produce more robust results than others, but discussing this would go too far at this point.)
- It is up to the implementation of our matrix-vector multiplication algorithm which summation order to choose, and it might choose differently, e.g. for speed reasons and for different memory layouts of the given data. In your case, you definitively have different memory layouts for your matrix data (C vs Fortran order), so that's what most likely happens: For the C-order array, a different summation order is chosen than for the Fortran-order one, so different cancellation effects happen, leading to (slightly) different results.
What is the impact of all this
I try to summarize some points that should serve as key takeaways:
I admit, I doubt a bit your final conclusion; namely, that the huge errors that you see in sklearn.linear_model.LinearRegression
have the same cause, but maybe it is the result of the aforementioned amplification effects. Maybe you should provide a minimum reproducible example for further investigation there.