Sub-arrays in numpy structured array not c contiguous

Question

I currently try to pack a multitude of arrays to numpy structured arrays. According to the numpy documentation

Sub-arrays always have a C-contiguous memory layout.

But if I create a structured array with:

x = np.zeros((2,), dtype=[('a', (np.float64, 5)), ('b', (np.float64, 5))])
x['a'].flags
# Out: C_CONTIGUOUS : False
#      F_CONTIGUOUS : False
#      OWNDATA : False
#      ...

While

x.flags
# Out: C_CONTIGUOUS : True
#      F_CONTIGUOUS : True
#      OWNDATA : True
#      ...

And using a "outer" shape of (1,) for the arrays yields:

x = np.zeros((1,), dtype=[('a', (np.float64, 5)),('b',(np.float64, 7))])
x['a'].flags
# Out: C_CONTIGUOUS : True
#      F_CONTIGUOUS : False
#      OWNDATA : False
#      ...

Omitting (1,) yields arrays with ndim=1 with c-contiguity. So the quote seems to be True only for the rows of an structured array.

What confuses me is that contiguity is given when I specify the array shape directly for each subarray:

x = np.zeros((), dtype=[('a', (np.float64, (2, 5))), ('b', (np.float64, (2, 5)))])

x['a'].flags
#Out: C_CONTIGUOUS : True
#     F_CONTIGUOUS : False
#     OWNDATA : False

From the quote of the numpy documentation I thought that sub-arrays always have a C-contiguous memory layout, but this only seems to be true for rows OR when the shape per array is given.
Where does this behavior come from? Is defining the "outer" shape (I don't know how to call it...) telling numpy to make row-wise sub-arrays of sub-arrays, while specifying the shape for each sub-array directly contiguously stores each sub-array?
What is the best way to deal with this when the first dimension of all sub-arrays is equal, while the second dimension is not? Should I specify each shape directly to keep them contiguous?

I don't follow your last sentence. In a dtype definition, the subarray dimensions have to be specified. — hpaulj, Aug 17 '18 at 16:04

hpaulj · Accepted Answer · 2018-08-17T16:10:29.080

With your dtype the array memory will be

x[0]['a'], x[0]['b']
x[1]['a'], x[1]['b']
....

That is, a record of x consists of 5 elements for field 'a', followed by 5 elements for field 'b', and so on for the next record.

When it says subarrays are C contiguous, it's referring to the layout of elements with in one field in one record.

A view of field 'a' across records will not be contiguous - the elements of 'b' will separate elements of different records.

The same thing applies to column slices from a 2d array:

In [32]: w = np.zeros((2,10))
In [33]: w.flags
Out[33]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  ...
In [34]: w[:,:5].flags    # w[:,5:] elements are in between
Out[34]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  ...

This contiguity comment is more relevant when a subarray is 2d (as in your last example):

In [35]: dt=np.dtype([('a', (np.float64, 5)), ('b', (np.float64, (2,2)))])
In [36]: x=np.zeros((2,2,),dt,order='F')
In [37]: x.flags
Out[37]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True

In [39]: x[0,0]['b'].flags
Out[39]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False

While the array as a whole is F contiguous, a 'b' element is still 'C' contiguous.

Define an array:

In [40]: x = np.array([(1,[2,3]),(4,[5,6])], dtype=[('a',int),('b',int,2)])
In [41]: x
Out[41]: array([(1, [2, 3]), (4, [5, 6])], dtype=[('a', '<i8'), ('b', '<i8', (2,))])

viewing the array as simple int dtype (not always possible):

In [42]: x.view(int)
Out[42]: array([1, 2, 3, 4, 5, 6])

The numbers are stored in memory consecutively. But the values for the 'b' field are not consecutive:

In [44]: x['b']
Out[44]: 
array([[2, 3],
       [5, 6]])

the values for 'a' come in between:

In [47]: x['a']
Out[47]: array([1, 4])

Thanks alot for your help! I'll be able to post my füll respond in monday. :) — JE_Muc, Aug 18 '18 at 15:26
Ok, now I can post a reply. Thanks again! Concluding from your post, all fields and records of a record array or structured array in total are saved in contigouos memory, but the contiguity of fields itself can be broken by interleaving other fields. And the allocated memory of a record array in total will always be at the same fixed memory address. This also explains why I can't make a view from a structured array field to another array, as in `x['a'] = some_array_of_same_shape`. This would have been my next question, but I guess it was also solved by your answer. :) — JE_Muc, Aug 21 '18 at 11:24

Sub-arrays in numpy structured array not c contiguous

1 Answers1