Fortran ordered (column-major) numpy structured array possible?

Question

I am looking for a way to more efficiently assign column of a numpy structured array.

Example:

my_col = fn_returning_1D_array(...)

executes more than two times faster on my machine than the same assignment to the column of a structured array:

test = np.ndarray(shape=(int(8e6),), dtype=dtype([('column1', 'S10'), ...more columns...]))
test['column1'] = fn_returning_1D_array(...)

I tried creating test with fortran ordering but it did not help. Presumably the fields stay interleaved in memory.

Does anybody have any idea here? I would be willing to use low-level numpy interfaces and cython if they could help.

Edit 1: in response to hpaulj's answer

The apparent equivalence of recarray column assignment and "normal" array column assignment results only if the latter is created with row-major order. With column-major ordering the two assignments are far from equivalent:

Row-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [4]: A1=np.zeros((M,N),'f')

In [9]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [10]: A2=np.zeros((M,),dtype=dt)

In [11]: X=np.arange(M+0.0)

In [13]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 2.36 s per loop

In [15]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.36 s per loop

In [16]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 334 ms per loop

In [8]: A1.flags
Out[8]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Column-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [3]: A1=np.zeros((M,N),'f', 'F')

In [4]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [5]: A2=np.zeros((M,),dtype=dt)

In [6]: X=np.arange(M+0.0)

In [8]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 374 ms per loop

In [9]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.43 s per loop

In [10]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 380 ms per loop

In [11]: A1.flags
Out[11]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Note that for column-major ordering the two buffers are no longer identical:

In [6]: A3=np.zeros_like(A2)

In [7]: A3.data = A1.data

In [20]: A2[0]
Out[20]: (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

In [21]: A2[1]
Out[21]: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)

In [16]: A3[0]
Out[16]: (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)

In [17]: A3[1]
Out[17]: (10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0)

`my_col = fn_returning_1D_array(...)` is just assigning a label to the data returned by the function. The assignment does not copy or allocate any array data. — YXD, Apr 18 '15 at 15:43
@MrE Agreed. Not copying would be ideal for the column assignment as well. Even if assignment is not possible without copying, the copy process takes much more time than it should for a memory to memory copy. I suspect this is because the data is not copied en bloc but interleaved with the other fields in the record. — ARF, Apr 18 '15 at 16:47
Compare this to filling columns of `np.empty((int(8e6),ManyColumns), dtype='S10')`. — hpaulj, Apr 18 '15 at 16:48
@hpaulj This creates a "normal" array (i.e. with identical dtypes). A structured array allows different dtypes for different columns. — ARF, Apr 18 '15 at 16:50
I think of structured arrays as arrays of tuples, where each tuple contains the values of all the fields for one 'row'. You are assigning one value per 'tuple'. It may be a bit more streamlined than that, but that's the general idea. No block copies. — hpaulj, Apr 18 '15 at 16:50
I was wondering how the column assignment speed is in a 'normal' array as opposed to this structured one. — hpaulj, Apr 18 '15 at 19:17
@hpauli I am still seeing a factor of 1.6 if I force a copy (which is likely very similar to column assignment): `my_col = fn_returning_1D_array(...).copy()` — ARF, Apr 18 '15 at 20:16
You just made a major rewrite of the question, including the subject line. — hpaulj, Apr 19 '15 at 13:30

hpaulj · Answer 1 · 2015-04-19T16:23:12.510

These are not equivalent actions. One just generates an array (and assigns it to a variable, a minor action). The other generates the array and fills a column of a structured array.

my_col = fn_returning_1D_array(...)
test['column1'] = fn_returning_1D_array(...)

I think a fairer comparison, would to fill in the columns of a 2D array.

In [38]: M,N=1000,10
In [39]: A1=np.zeros((M,N),'f')   # 2D array
In [40]: dt=np.dtype(','.join(['f' for _ in range(N)]))
In [41]: A2=np.zeros((M,),dtype=dt)   # structured array
In [42]: X=np.arange(M+0.0)

In [43]: A1[:,0]=X   # fill a column
In [44]: A2['f0']=X   # fill a field

In [45]: timeit for n in range(N):A1[:,n]=X
10000 loops, best of 3: 65.3 µs per loop

In [46]: timeit for n in dt.names: A2[n]=X
10000 loops, best of 3: 40.6 µs per loop

I'm a bit surprised that filling the structured array is faster.

Of course the fast way to fill the 2D array is with broadcasting:

In [50]: timeit A1[:,:]=X[:,None]
10000 loops, best of 3: 29.2 µs per loop

But the improvement over filling the fields is not that great.

I don't see anything significantly wrong with filling a structured array field by field. It's got to be faster than generating a list of tuples to fill the whole array.

I believe A1 and A2 have identical data buffers. For example if I makes a zeros copy of A2, I can replace its data buffer with A1's, and get a valid structured array

In [64]: A3=np.zeros_like(A2)
In [65]: A3.data=A1.data

So the faster way of filling the structured array is to do the fastest 2D fill, followed by this data assignment.

But in the general case the challenge is to create a compatible 2D array. It's easy when all field dtypes are the same. With a mix of dtypes you'd have to work at the byte level. There are some advanced dtype specifications (with offsets, etc), that may facilitate such a mapping.

Now you have shifted the focus to Fortran order. In the case of a 2d array that does help. But it will do so at the expense of row oriented operations.

In [89]: A1=np.zeros((M,N),'f',order='F')

In [90]: timeit A1[:,:]=X[:,None]
100000 loops, best of 3: 18.2 µs per loop

One thing that you haven't mentioned, at least not before the last rewrite of the question, is how you intend to use this array. If it is just a place to store a number of arrays by name, you could use a straight forward Python dictionary:

In [96]: timeit D={name:X.copy() for name in dt.names}
10000 loops, best of 3: 25.2 µs per loop

Though this really is a time test for X.copy().

In any case, there isn't any equivalent to Fortran order when dealing with dtypes. None of the array operations like reshape, swapaxes, strides, broadcasting cross the 'dtype' boundary.

Thanks for taking the time to make this example. Two comments: 1. The assignment times are comparable since as you mentioned the memory layout of both are identical. Timings are radically different for fortran ordered "normal" arrays. 2. Timings are not equal since the arrays you are using are too small and you are presumably seeing different python overheards. See my ammended question for illustrating timings. - Thank you nevertheless! — ARF, Apr 19 '15 at 07:41
Right, Fortran order only applies to the regular dimensions of an array. Any extra dimensionality added by the dtype is separate, and can't be transposed or reordered or reshaped. — hpaulj, Apr 19 '15 at 08:46

Fortran ordered (column-major) numpy structured array possible?

Edit 1: in response to hpaulj's answer

1 Answers1