Background
I'm passing large matrices into a Cython function which in turn calls the BLAS routine dgemm
. I've been pulling my hair out for 15 hours trying to understand why the BLAS output is not in line with np.dot
. I've narrowed it down to the f_contiguous
flag on any slice of on one of my input arrays.
Although on the face of things the array shows as f_contiguous
, any slice that I take (which will be passed into the BLAS routine is in fact not f_contiguous
, which it needs to be for the scipy.linalg.cython_blas
wrappers.
I'm new to both Cython & calling Fortran routines so I wouldn't be surprised if there is some glaring oversight on my part.
In Example I take a list of (4, 4)
arrays (with fortran contiguity) and concatenate them into a (4n, 4)
array. Because concatenate does not have an order
argument, I must then tell create a new array from the concatenated one using F order
. This produces the issue in which the array itself is f_contiguous
but any 2D slice I take from it is not.
Example
n = 10
arr_list = [np.eye(4, order='F') for _ in range(n)]
arr_concat = np.concatenate(arr_list)
arr_f = np.array(arr_concat, order='F')
print('Array F-Contiguous? {}'.format(arr_f.flags.f_contiguous))
print('2D Array Slice F-Contiguous? {}'.format(arr_f[0:5, :].flags.f_contiguous))
Array F-Contiguous? True
2D Array Slice F-Contiguous? False
Discussion
This issue is similar to the one posted here. The solution to me seems to construct a list of lists using comprehension. Then calling np.array(order='F')
on the nested list. Or iterating through the array and forcing each section to be f_contiguous
. But this doesn't seem elegant to me, and I'd like to avoid it because these matrices may get quite large. This function will be called tens of thousands of times so I need to make the process as fast as possible.
Am I doing something wrong here?