4

I'm trying to get the memory address of data held within a PyObject* (from Python.h 3.8.2 specifically) so I can do a memcpy to a buffer. I've only been able to figure out how to copy the data out of the object but nothing on just getting the pointer. Say I have this object data ...

PyObject* data = PyLong_FromLong(100L);

As of now it seems my only option to get this data over to the buffer is to copy it out and then do a memcpy using the address of the temporary variable ...

long temp = PyLong_AsLong(data);
memcpy(buffer, &temp, 8);

This is being done thousands and thousands of times so I would assume it'd be faster if I'm able to get the memory address of the data and directly copy that over to my buffer like ...

memcpy(buffer, data->address_to_data(), 8)

instead of having that extra copy to the temporary variable.

Does anyone know if / how I can get the memory address of the long value from the PyObject* wrapper?

Appreciate the help!

DavidW
  • 29,336
  • 6
  • 55
  • 86
Tyler Weiss
  • 139
  • 1
  • 15
  • What do you think to use [PyLong_AsVoidPtr](https://docs.python.org/3/c-api/long.html#c.PyLong_AsVoidPtr)? – JTejedor Sep 21 '21 at 06:57
  • A few things worth pointing out: 1) PyLong can store an arbitrarily large number (i.e. much larger than a C `long`) so there isn't really an internal `long` that you can access. 2) a "generic python object" can contain things like pointers to other Python objects, which need care when you copy them. 3) if you're looking for fast access to numeric values then maybe you should be using something like `array.array` with the buffer protocol – DavidW Sep 22 '21 at 21:21
  • @DavidW thank you for the detailed comment. Can you expand on "`array.array` with the buffer protocol" a little bit more. I am looking for the fastest way to get values put on to a buffer. – Tyler Weiss Sep 24 '21 at 03:17

3 Answers3

3

This seems like an X-Y problem (i.e. you think you need to extract the data out of a bunch of Python objects at C level, but actually you would benefit from having a single Python object which exposes all your data).

A Python int can store (almost) arbitrarily large numbers:

>>> 1000**1000  # creates a very big int

i.e. it is not stored internally as a C long. Internally it is stored as an array of integers (ob_digits) of size ob_size which are in a slightly odd format that isn't much use to you. However, if you really wanted to copy it you would case your object pointer to a PyLongObject* and then do a memcpy(&dest, my_int->ob_digit, sizeof(digit)*abs(my_int->ob_size));. I recommend against this because it's pretty hard for you to use this data.

Obviously this only applies if you know you have a Python int. For a "generic PyObject*" this doesn't work, because a generic PyObject* can contain almost any data. This includes pointers which need ownership and/or reference counting (this especially applies to any PyObject that contains other PyObjects).


What I think you actually want is to store your data in a large array of C integers. This can be done with array.array, or numpy.array, or a variety of other classes.

At a C level these objects support the buffer protocol where they expose that internal array to C, allowing each of your values to be accessed, copied, manipulated, etc. from C.

Some quick untested illustrative code:

Py_Buffer view;
view.format = "l"; // request an array of longs
if (PyObject_GetBuffer(obj, &view, PyBUF_CONTIG | PyBUF_FORMAT | PyBUF_WRITABLE ) == -1) {
   // failed
   return NULL;
}

// you want to check that view.ndim == 1 (for a simple 1D array)
long* data = (long*)view.buf;
// At this point you can access data as a C array of length view.len

// When you've finished;
PyBuffer_Release(view);
DavidW
  • 29,336
  • 6
  • 55
  • 86
0

This seems to be a design issue relating to abstraction of data structures. Often, it is desirable to give the user an opaque data structure or pointer. Access to internal elements would require a method (or function) call.

From https://docs.python.org/3/c-api/long.html,

PyObject* PyLong_FromLong(long v)

Return value: New reference.
Return a new PyLongObject object from v, or NULL on failure.

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.

The call is potentially doing an internal conversion to PyLongObject which can be a link to its internal object. If you pass between -5 to 256, it will replace your integer with its internal object. For others, a new object will be created. Even if you locate the internal memory location, there is no guarantee the behavior will remain consistent.

PyObject seems to be designed to be opaque. Treat it as such.

moi
  • 467
  • 4
  • 19
  • Thank you for the detailed response, what would be the best method to get that `PyObject*` put on a buffer as a long? Are the steps I'm following in the original post the best I'm going to get? – Tyler Weiss Sep 24 '21 at 03:21
  • Whats wrong with the method that you use ? = ```As of now it seems my only option to get this data over to the buffer is to copy it out and then do a memcpy using the address of the temporary variable ``` long temp = PyLong_AsLong(data); memcpy(buffer, &temp, 8); – moi Sep 24 '21 at 12:04
  • It would be faster if i didn't have that extra copy from the long value PyLong_AsLong constructs to my temporary variable. Ideally, there would be some Py function to construct the long value directly onto the buffer. I understand that there most likely isn't that option. I just wanted to make sure that what I have in my original post is my best option for performance. – Tyler Weiss Sep 24 '21 at 17:30
  • Its not the best option - but the nature of an opaque object requires a lack of direct memory access. It has to do with the way objects work. Sadly, you will have to live with this performance hit. – moi Sep 25 '21 at 10:03
0

There is an internal CPython function that does something like what you want, called _PyLong_AsByteArray.

It seems to read the needed bytes from a field called ob_digit, but I don't fully get the whole function.

unddoch
  • 5,790
  • 1
  • 24
  • 37