2

I have a python set that contains a collection of non-hashable python objects with uniform type which I want to process.

To improve efficiency of my algorithms, I would like to interface using ctypes with an external index implementation that accepts only uint64 as data values.

I was hoping that I could to pass pointer references to the python object into this external library as uint64?

I tried ctypes.cast(ctypes.py_object(my_python_object), ctypes.c_uint64) but am getting ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type.

Also, what about the reverse, getting a reference to a python object as uint64 and turning it into a "real" python object?

ARF
  • 7,420
  • 8
  • 45
  • 72
  • Seriously, what's wrong with using `id()`? – Antti Haapala -- Слава Україні Apr 14 '16 at 18:32
  • @AnttiHaapala Seriously nothing. :-) Just that I never realised it existed. Any chance of getting a python object from the `id()` value? If not, I could always convert my `set` into a `dict` with the ids as keys. – ARF Apr 14 '16 at 18:36
  • The thing is... If your object is dead, you cannot get it back from the `id()` - instead you will crash your interperter :D – Antti Haapala -- Слава Україні Apr 14 '16 at 18:37
  • Would tools like Cython or manually writing a small extension module to interface with the C code be an option? – user2357112 Apr 14 '16 at 18:43
  • What you're doing seems dubious, but you haven't provided enough details to say one way or the other. Anyway, if you're working directly on Python objects using a C library, make sure to load it as a `PyDLL` instance that holds the GIL when calling functions. Then just set the function's `argtypes`, with the Python object parameter defined as `py_object`. The C function will handle this as a `uint64_t`. Passing the object directly increments the reference count during the call, so there's no danger of the object getting deallocated on another thread. – Eryk Sun Apr 14 '16 at 19:20
  • @AnttiHaapala The objects will still be alive, because they remain in the set. The index only stores some unique object id (it need not be a pointer). Hence, the `id()` function seems perfect. – ARF Apr 14 '16 at 19:35
  • @user2357112 I want to avoid cython. With the `id()` function, also it is no longer as dubious as initially. I will store the objects in a `dict` keyed by their `id()`'s. Only the `id()` values are stored and retrieved from the external index. The object gets retrieved from the dict with the id key. No messing with references, pointers, etc. - Seems very safe. – ARF Apr 14 '16 at 19:37
  • @eryksun Many thanks for the advice. I am sure I will make use of it in future projects. – ARF Apr 14 '16 at 19:43
  • Why not use it now? If you have a C funtion `foo`, surely it's simpler to call `lib.foo(x)` then `lib.foo(id(x))`. You just have the one-time setup of `lib.foo.argtypes = [py_object]`. – Eryk Sun Apr 14 '16 at 19:45
  • @eryksun Indeed, but I am extending somebody else's python package by subclassing a class that already handles the ctypes part. With `id()` it will be easier to have the PR accepted because I am not touching any low-level parts of the package: my PR will be self-contained. – ARF Apr 14 '16 at 21:59

1 Answers1

4

Why wouldn't you simply use the id() function in CPython?

>>> x
<object object at 0x7fd2fc742090>
>>> hex(id(x))
'0x7fd2fc742090'

The CPython documentation of id() says that

id(object)

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.


You also need to mess with the reference counts and such, if you're to "convert" this uint64_t of yours back to a Python object. As far as I know, ctypes do not easily let one to increase/decrease the reference counts of Python

Community
  • 1
  • 1
  • Strictly speaking, this implementation detail could change in the future. – user2357112 Apr 14 '16 at 18:38
  • The very similar case of `object.__hash__` [actually has changed](http://bugs.python.org/issue5186) in the past; it no longer just returns `id(self)`. I wouldn't be entirely surprised if they ever decide to change `id`, for example, to reduce the same kind of bucket collisions that occurred with `object.__hash__`. – user2357112 Apr 14 '16 at 18:55