2

Is there anyway to access the underlying structures in cython/c++ for polars?

I have a number of scripts that grab np.ndarrays and iterates. Is there anything similar for polars?

Michael WS
  • 2,450
  • 4
  • 24
  • 46
  • 1
    Should be possible. If it already exposes a byte interface that you can memoryview, should be able to decipher it pretty easily. Otherwise go looking at what columnar format it has underneath. And make a c++ equivalent to the rust. – Abel Jul 24 '22 at 22:33
  • I am not quite sure how to see the arrow data – Michael WS Jul 25 '22 at 00:23
  • It's arrow under the hood, but I don't see how to access natively – Michael WS Aug 03 '22 at 10:11

3 Answers3

1

It should definitely be possible. Polars memory can be exported to pyarrow zero copy. And then you can use arrow's C data interface to get a hold of that memory.

Here is an example in the polars repo where they use the C data interface to get a hold of that memory again in Rust. https://github.com/pola-rs/polars/tree/master/examples/python_rust_compiled_function

ritchie46
  • 10,405
  • 1
  • 24
  • 43
1

polars.DataFrame([1,2,3]).to_arrow() will get a table that can be modified and edited.

 cimport pyarrow
 cimport pyarrow.lib
 from libcpp.memory cimport shared_ptr
 cdef  iterate_through_table(polars_obj):
     cdef:
         shared_ptr[pyarrow.CTable] table = pyarrow.lib.unwrap_table(polars_obj.to_arrow())   
     
Michael WS
  • 2,450
  • 4
  • 24
  • 46
1

The underlying data structures for polars are C++ arrays. However, you can access them using the Python API. For example, to get the underlying data for a polar, you can use the get_data() method:

polar = some_polar.get_data()

You can then iterate over the data using the for keyword:

for x, y in polar:
    # do something with x and y

If you need to access the underlying C++ arrays directly, you can use the get_cpp_array() method:

polar_array = some_polar.get_cpp_array()

You can then access the data using the [] operator:

x = polar_array[0]
y = polar_array[1]
c0d3x27
  • 193
  • 2
  • 15
  • Create a new variable in the Python file that is the name of the C++ structure. You can then access the underlying structures through this new variable. – c0d3x27 Aug 04 '22 at 04:14
  • we can get pretty close. ```python >>> from polars import Polar >>> from ctypes import c_void_p, cast >>> p = Polar(range(10), range(10)) >>> p.__array_interface__['data'] (10966272, False) >>> cast(p.__array_interface__['data'][0], c_void_p).value 10966272 >>> p.as_ndarray().__array_interface__['data'] (10966272, False) >>> cast(p.as_ndarray().__array_interface__['data'][0], c_void_p).value 10966272 ``` – c0d3x27 Aug 04 '22 at 04:19
  • So we can get the `ctypes.c_void_p` for the underlying memory address of a `polars.Polar` using the `__array_interface__` dictionary. We can then use `ctypes.cast` to convert this back to a Python `int`. Keep in mind that `polars.Polar` is not a `numpy.ndarray`, so many of the methods and attributes that you would expect to work with `ndarray`s will not work with `Polar`s. However, you can always convert a `Polar` to an `ndarray` using the `as_ndarray` method. – c0d3x27 Aug 04 '22 at 04:20