2

I convert an arrow object with « zero copy », to panda, but the result object is not aligned.

#create a pyarrow.table.Table from parquet file
pq_file=pq.ParquetFile(parquet_file_name)
arrow_table=pq_file.read()

#convert pyarrow.table.Table to panda with zero copy
df=arrow_table.to_pandas(zero_copy_only=True)

#check if the numpy array is aligned :
print("alignment: {}".format(df.as_matrix().__array_interface__['data'[0]%64))

Code return: alignment: 16

Conclusion: The NumPy array is not aligned.As I convert pyarrow.table.Table to panda with “zero copy”, I conclude that the pyarrow.table.Table itself is not aligned. Where am I wrong?

Machavity
  • 30,841
  • 27
  • 92
  • 100

1 Answers1

2

Response from uwe:

  1. I’m not sure if the zero_copy_only Flag is working correctly in Arrow 0.8, we have made recently some fixes (but not released them yet).
  2. There is a new buffers property in the upcoming release, where you can also check the memory address of the PyArrow Arrow. This is useful to verify in Python that the zero copy was really zero copy.
  3. You need to be aware that df.as_matrix() may also do a copy if you don’t have a DataFrame that has a single DType.