2

I have a numpy structured array of the following form:

x = np.array([(1,2,3)]*2, [('t', np.int16), ('x', np.int8), ('y', np.int8)])

I now want to generate views into this array that team up 't' with either 'x' or 'y'. The usual syntax creates a copy:

v_copy = x[['t', 'y']]
v_copy
#array([(1, 3), (1, 3)], 
#     dtype=[('t', '<i2'), ('y', '|i1')])

v_copy.base is None
#True

This is not unexpected, since picking two fields is "fancy indexing", at which point numpy gives up and makes a copy. Since my actual records are large, I want to avoid the copy at all costs.

It is not at all true that the required elements cannot be accessed within numpy's strided memory model. Looking at the individual bytes in memory:

x.view(np.int8)
#array([1, 0, 2, 3, 1, 0, 2, 3], dtype=int8)

one can figure out the necessary strides:

v = np.recarray((2,2), [('b', np.int8)], buf=x, strides=(4,3))
v
#rec.array([[(1,), (3,)],
#    [(1,), (3,)]], 
#    dtype=[('b', '|i1')])
v.base is x
#True

Clearly, v points to the correct locations in memory without having created a copy. Unfortunately, numpy won't allow me to reinterpret these memory locations as the original data types:

v_view = v.view([('t', np.int16), ('y', np.int8)])
#ValueError: new type not compatible with array.

Is there a way to trick numpy into doing this cast, so that an array v_view equivalent to v_copy is created, but without having made a copy? Perhaps working directly on v.__array_interface__, as is done in np.lib.stride_tricks.as_strided()?

Stefan
  • 4,380
  • 2
  • 30
  • 33

2 Answers2

1

You can construct a suitable dtype like so

dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2)))

and then do

y = np.recarray(x.shape, buf=x, strides=x.strides, dtype=dt2)

In future Numpy versions (> 1.6), you can also do

dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2), itemsize=4))
y = x.view(dt2)
pv.
  • 33,875
  • 8
  • 55
  • 49
  • Great! So I missed that data types can be defined with `offsets`... Your first version with `np.recarray` works for both `['t','x']` and `['t','y']` views. The second (more elegant) one works for `offsets=(0,3)` (`['t','y']`) but *not* for `offsets=(0,2)` (`['t','x']`). I specifically upgraded from numpy 1.6.1 to 1.6.2, to no avail. Any ideas why the `offsets=(0,2)` case doesn't work for me? – Stefan Aug 03 '12 at 11:59
  • The `itemsize` keyword was added in Numpy > 1.6 (ie. currently development versions), so with 1.6.2 you get a dtype with a itemsize incompatible with the array size. – pv. Aug 03 '12 at 16:55
  • Thanks! I tested this with numpy 1.8dev and it works. However, I wasn't ready to switch to the development version, since it would require recompiling all modules. So I did some more research and found that one can achieve the same effect by using the `np.ndarray` constructor. I posted this as a separate answer. – Stefan Aug 05 '12 at 10:06
0

This works with numpy 1.6.x and avoids creating a recarray:

dt2 = {'t': (np.int16, 0), 'y': (np.int8, 3)}
v_view = np.ndarray(x.shape, dtype=dt2, buffer=x, strides=x.strides)
v_view
#array([(1, 3), (1, 3)], 
#    dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True

One can wrap this in a class overloading np.ndarray:

class arrayview(np.ndarray):
    def __new__(subtype, x, fields):
        dtype = {f: x.dtype.fields[f] for f in fields}
        return np.ndarray.__new__(subtype, x.shape, dtype,
                                  buffer=x, strides=x.strides)

v_view = arrayview(x, ('t', 'y'))
v_view
#arrayview([(1, 3), (1, 3)], 
#    dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True
Stefan
  • 4,380
  • 2
  • 30
  • 33