6

In pyarrow, what is the suggested way of writing a pyarrow.Tensor (e.g. created from a numpy.ndarray) to a Parquet file? Is it even possible without having to go through pyarrow.Table and pandas.DataFrame?

Martin Studer
  • 2,213
  • 1
  • 18
  • 23

2 Answers2

8

The data model for Parquet is tabular, so somewhere the tensor/ndarray must get converted to a tabular form. We don't have any built-in convenience functions to help with this, but feel free to make specific feature requests on the issue tracker https://issues.apache.org/jira/projects/ARROW

Wes McKinney
  • 101,437
  • 32
  • 142
  • 108
3

The Parquet format is optimised for tables with nested data, i.e. it expects that data is represented as named columns. This is a bit in contrast to the idea of n-dimensional columns. For tensors, it is better to choose a different format.

Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42