2

I think I can do: np.zeros((), dtype=dt).strides, but this doesn't seem efficient when the dtype is a large array type like: ('<f8', (200, 100)). Is there a way of going directly from dtype to strides in numpy?

Neil G
  • 32,138
  • 39
  • 156
  • 257
  • 3
    The strides aren't a property of the `dtype`, they're a property of the array. `x.strides` depends on `x.shape`. Furthermore, the strides reflect the ordering of the array in memory (e.g. C vs Fortran order or things like `x = x[::2]`). Therefore strides are specific to a specific memory layout of a specific array of a specific shape. – Joe Kington Sep 30 '15 at 18:02
  • @JoeKington: See my example. The dtype can have a shape within it and that's what I'm trying to get strides from. The memory order is also specified in the dtype ("<"). – Neil G Sep 30 '15 at 18:04
  • 1
    Ah, that makes more sense now! I missed that part. Sorry for the misunderstanding. – Joe Kington Sep 30 '15 at 18:04
  • 1
    I think fields of a dtype with a shape within a structured array are required to be in C-order, but I'm not 100% sure on that. At any rate, the `<` specifies little-endian, not memory order. – Joe Kington Sep 30 '15 at 18:10
  • @JoeKington: Good point. – Neil G Sep 30 '15 at 18:12

2 Answers2

3

You can actually get the strides of a sub-array within a structured array without creating the "full" array.

Sub-arrays within a structured array are required to be contiguous and in C-order according to the documentation. Note the sentence just above the first example:

Sub-arrays always have a C-contiguous memory layout.

Therefore, for a structured array with no fields such as the one in your example, you can do (as an unreadable one-liner):

import numpy as np

x = np.dtype(('<f8', (200, 100)))

strides = x.base.itemsize * np.r_[1, np.cumprod(x.shape[::-1][:-1])][::-1]

Avoiding the code golf:

shape = list(x.shape)

# First, let's make the strides for an array with an itemsize of 1 in C-order
tmp_strides = shape[::-1]
tmp_strides[1:] = list(np.cumprod(tmp_strides[:-1]))
tmp_strides[0] = 1

# Now adjust it for the real itemsize:
tmp_strides = x.base.itemsize * np.array(tmp_strides)

# And convert it to a tuple, reversing it back for proper C-order
strides = tuple(tmp_strides[::-1])

This gets more complex when there are multiple fields, however. You'd need to put in approriate checks in general. For example: Does the dtype have a shape attribute? Does it have fields? Do any fields have shape attributes?

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • I had it using the itemsize in my old code. Is it guaranteed to store the elements of the array contiguously? – Neil G Sep 30 '15 at 19:41
  • If I understand the documentation correctly, the elements of the subarray are guarenteed to be contiguous and in C-order. (Also I have a mistake in the long version of the code in my answer. Fixing now. The one-liner is correct, though.) – Joe Kington Sep 30 '15 at 19:43
  • Yes, sorry for the broken link initially: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html Have a look at the sentence above the first example. – Joe Kington Sep 30 '15 at 19:45
  • Aha! Yes you're right. I used to do this in my old code, but then I wanted to use strides to be super safe. – Neil G Sep 30 '15 at 19:45
2

I think you are talking about an array with:

In [257]: dt=np.dtype([('f0',float, (200,100))])
In [258]: x=np.zeros((),dtype=dt)

The array itself is 0d with one item.

In [259]: x.strides
Out[259]: ()

That item has shape and strides determined by the dtype:

In [260]: x['f0'].strides
Out[260]: (800, 8)
In [261]: x['f0'].shape
Out[261]: (200, 100)

But is constructing x any different than constructing a plain float array with the same shape?

In [262]: y=np.zeros((200,100),float)
In [263]: y.strides
Out[263]: (800, 8)

You can't get the strides of a potential y without actually constructing it.

Ipython whos command shows x and y take up about the same space:

x          ndarray       : 1 elems, type `[('f0', '<f8', (200, 100))]`,
   160000 bytes (156.25 kb)
y          ndarray       200x100: 20000 elems, type `float64`, 
   160000 bytes (156.25 kb)

An iteresting question is whether such an x['f0'] has all the properties of y. You can probably read all the properties, but may be limited in what ones you can change.


You can parse the dtype:

In [309]: dt=np.dtype([('f0',float, (200,100))])
In [310]: dt.fields
Out[310]: mappingproxy({'f0': (dtype(('<f8', (200, 100))), 0)})
In [311]: dt[0]
Out[311]: dtype(('<f8', (200, 100)))
In [312]: dt[0].shape
Out[312]: (200, 100)
In [324]: dt[0].base
Out[324]: dtype('float64')

I don't see a strides like attribute of dt or dt[0]. There may be some numpy function that calculates the strides, based on shape, but it probably is hidden. You could search the np.lib.stride_tricks module. That's where as_strided is found.

From the (200,100) shape, and float64 taking 8 bytes, it is possible calculate that the normal (default) strides is (8*100, 8).

For dtype that isn't further nested, this seems to work:

In [374]: dt[0]
Out[374]: dtype(('<f8', (200, 100)))
In [375]: tuple(np.array(dt[0].shape[1:]+(1,))*dt[0].base.itemsize)
Out[375]: (800, 8)

Lets make a more complex array with this dtype

In [346]: x=np.zeros((3,1),dtype=dt)
In [347]: x.shape
Out[347]: (3, 1)
In [348]: x.strides
Out[348]: (160000, 160000)

Its strides depends on the shape and itemsize. But the shape and strides of a field are 4d. Can we say they exist without actually accessing the field?

In [349]: x['f0'].strides
Out[349]: (160000, 160000, 800, 8)

strides for an item:

In [350]: x[0,0]['f0'].strides
Out[350]: (800, 8)

How about double nesting?

In [390]: dt1=np.dtype([('f0',np.dtype([('f00',int,(3,4))]), (20,10))])
In [391]: z=np.zeros((),dt1)
In [392]: z['f0']['f00'].shape
Out[392]: (20, 10, 3, 4)
In [393]: z['f0']['f00'].strides
Out[393]: (480, 48, 16, 4)
In [399]: (np.cumprod(np.array((10,3,4,1))[::-1])*4)[::-1]
Out[399]: array([480,  48,  16,   4], dtype=int32)

correction, the striding for a field is a combination of the striding for the array as a whole plus striding for the field. It can be seen with a multifield dtype

In [430]: dt=np.dtype([('f0',float, (3,4)),('f1',int),('f2',int,(2,))])
In [431]: x=np.zeros((3,2),dt)
In [432]: x.shape
Out[432]: (3, 2)
In [433]: x.strides
Out[433]: (216, 108)
In [434]: x['f0'].shape
Out[434]: (3, 2, 3, 4)
In [435]: x['f0'].strides
Out[435]: (216, 108, 32, 8)

(216,108) is striding for the whole array (itemsize is 108), concatenated with the striding for the f0 field (32,8) (itemsize 8).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • That's my point: You should be able to get the strides without reconstructing the array. Also, to answer your question, the reason I have a dtype with a shape is that it exists as part of a more complicated dtype whose corresponding array will have an additional shape when it is created (later). – Neil G Sep 30 '15 at 19:16
  • Regarding your edit: I know about fields and reading the sub-dtype shapes. What I want are the strides. – Neil G Sep 30 '15 at 19:32
  • I can calculate strides from a shape and base dtype, assuming normal construction. But I have to have an actual array to find a strides attribute. – hpaulj Sep 30 '15 at 20:00
  • You're right. I didn't realize that the array was stored contiguously and it was just a matter of calculation. – Neil G Sep 30 '15 at 20:39