3

I am trying to understand the meaning of ndarray.data field in numpy (see memory layout section of the reference page on N-dimensional arrays), especially for views into arrays. To quote the documentation:

ndarray.data -- Python buffer object pointing to the start of the array’s data

According to this description, I was expecting this to be a pointer to the C-array underlying the instance of ndarray.

Consider x = np.arange(5, dtype=np.float64).

Form y as a view into x using a slice: y = x[3:1:-1].

I was expecting x.data to point at location of 0. and y.data to point at the location of 3.. I was expecting the memory pointer printed by y.data to thus be offset by 3*x.itemsize bytes from the memory pointer printed by x.data, but this does not appear to be the case:

>>> import numpy as np
>>> x = np.arange(5, dtype=np.float64)
>>> y = x[ 3:1:-1]
>>> x.data
<memory at 0x000000F2F5150348>
>>> y.data
<memory at 0x000000F2F5150408>
>>> int('0x000000F2F5150408', 16) - int('0x000000F2F5150348', 16)
192
>>> 3*x.itemsize
24

The 'data' key in __array_interface dictionary associated with the ndarray instance behaves more like I expect, although it may itself not be a pointer:

>>> y.__array_interface__['data'][0] - x.__array_interface__['data'][0]
24

So this begs the question, what does the ndarray.data give?

Thanks in advance.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
user40314
  • 269
  • 2
  • 8
  • 2
    Since `y` is non-continuous, it doesn't expose `data` (`>>>y.data AttributeError: cannot get single-segment buffer for discontiguous array`). So I kinda cannot imagine how you're going to compare `x.data` and `y.data`. (`numpy 1.11.1` and `python 2.7.12 win32` here) . – ivan_pozdeev Sep 14 '16 at 22:11
  • 1
    192 = 3*64, just saying – toine Sep 14 '16 at 22:13
  • @ivan_pozdeev I am not getting this error from evaluation of `y.data` using numpy 1.11 on Windows and Linux using Python 3.5.2 from Anaconda distribution. What is your configuration? – user40314 Sep 14 '16 at 22:16
  • 24 bytes, 192 bits? – hpaulj Sep 14 '16 at 22:21
  • @user40314 I only said that it returns an error for me, I couldn't know about the cause of discrepancy. Since the question couldn't be answered without finding it out, I required additional info. – ivan_pozdeev Sep 14 '16 at 22:38

2 Answers2

3

<memory at 0x000000F2F5150348> is a memoryview object located at address 0x000000F2F5150348; the buffer it provides access to is located somewhere else.

Memoryviews provide a number of operations described in the relevant official documentation, but at least on the Python-side API, they do not provide any way to access the raw address of the memory they expose. Particularly, the at whatevernumber number is not what you're looking for.

user2357112
  • 260,549
  • 28
  • 431
  • 505
2

Generally the number displayed by x.data isn't meant to be used by you. x.data is the buffer, which can be used in other contexts that expect a buffer.

np.frombuffer(x.data,dtype=float)

replicates your x.

np.frombuffer(x[3:].data,dtype=float)

this replicates x[3:]. But from Python you can't take x.data, add 192 bits (3*8*8) to it, and expect to get x[3:].

I often use the __array_interface__['data'] value to check whether two variables share a data buffer, but I don't use that number for any thing. These are informative numbers, not working values.

I recently explored this in

Creating a NumPy array directly from __array_interface__

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you for your response and for the link. My confusion was in conflating `PyArray_DATA` on C-API with `ndarray.data`. – user40314 Sep 14 '16 at 22:36