numpy structured array inconsistency

Question

I'm writing a library that uses NumPy arrays and I have a scalar operation I would like to perform on any dtype. This works fine for most structured arrays, however I run into a problem when creating structured arrays with multiple dimensions for structured elements. As an example,

x = np.zeros(10, np.dtype('3float32,int8'))
print(x.dtype)
print(x.shape)

shows

[('f0', '<f4', (3,)), ('f1', 'i1')]
(10,)

but

x = np.zeros(10, np.dtype('3float32'))
print(x.dtype)
print(x.shape)

yields

float32
(10, 3)

that is, creating a structured array with a single multidimensional field appears to instead expand the array shape. This means that the number of dimensions for the last example is 2, not 1 as I was expecting. Is there anything I'm missing here, or a known workaround?

there are other more detailed ways of specifying a compound dtype. You've just chosen a shorthand, and thus are subject its translation rules. — hpaulj, Aug 29 '22 at 14:34
I don't think this is correct; there appears to be no way to get np.zeros to use a dtype of dtype((' — Eric J, Aug 30 '22 at 15:28

hpaulj · Answer 1 · 2022-08-29T15:45:05.143

0

Use the same dtype notation as displayed in the first working example:

In [92]: x = np.zeros(3, np.dtype([('f0','<f4',(3,))]))

In [93]: x
Out[93]: 
array([([0., 0., 0.],), ([0., 0., 0.],), ([0., 0., 0.],)],
      dtype=[('f0', '<f4', (3,))])

I don't normally use the string shorthand,

In [99]: np.dtype('3float32')
Out[99]: dtype(('<f4', (3,)))     # no field name assigned

In [100]: np.zeros(3,_)
Out[100]: 
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], dtype=float32)

A couple of comma separated strings creates named fields:

In [102]: np.dtype('3float32,i4')
Out[102]: dtype([('f0', '<f4', (3,)), ('f1', '<i4')])

edited Aug 29 '22 at 15:45

answered Aug 29 '22 at 15:31

hpaulj

221,503
14
230
353

I don't think this actually answers my question; for your second example, note that the shape of the array is (3, 3) and not (3,) -- that is, the dtype is now float32 and the ndims have been increased by one! – Eric J Aug 30 '22 at 15:25
`Out[93]` is 1d, with 1 field. What I'm trying to show is that the string shorthand is not adequate for defining a single field dtype. You need to use the list of tuples approach. My [100] example is same as yours; [99] just show the dtype that's actually used by `np.zeros`. You want the longer dtype as displayed in [93] – hpaulj Aug 30 '22 at 16:28
The docs for the string shorthand specify "comma separated". It may seem picky, but '3float32' is not comma separated. To get a 1d, 1 field array you need to specify the field name. It might instructive to do field access in the '3float32,int8' case. Look at `x['f0']` and `x[['f0']]`. One produces a 2d float array, the other a 1d 1 field array. – hpaulj Aug 30 '22 at 17:44
So the real issue is that numpy cannot create a structured dtype that does not have field names. – Eric J Aug 30 '22 at 20:22
In my experience, fields with field names is an integral part of a compound dtype and structured array, A nameless field does not make sense. – hpaulj Aug 30 '22 at 22:02

numpy structured array inconsistency

1 Answers1