In pyarrow, what is the suggested way of writing a pyarrow.Tensor
(e.g. created from a numpy.ndarray
) to a Parquet file? Is it even possible without having to go through pyarrow.Table
and pandas.DataFrame
?
Asked
Active
Viewed 2,934 times
6

Martin Studer
- 2,213
- 1
- 18
- 23
-
2It's been a while. Did you find some interesting way to achieve this Martin? – Leo Gallucci Feb 12 '19 at 14:03
2 Answers
8
The data model for Parquet is tabular, so somewhere the tensor/ndarray must get converted to a tabular form. We don't have any built-in convenience functions to help with this, but feel free to make specific feature requests on the issue tracker https://issues.apache.org/jira/projects/ARROW

Wes McKinney
- 101,437
- 32
- 142
- 108
-
[ARROW-5645](https://issues.apache.org/jira/browse/ARROW-5645), [ARROW-5819](https://issues.apache.org/jira/browse/ARROW-5819) – Martin Thøgersen Jan 27 '22 at 13:32
3
The Parquet format is optimised for tables with nested data, i.e. it expects that data is represented as named columns. This is a bit in contrast to the idea of n-dimensional columns. For tensors, it is better to choose a different format.

Uwe L. Korn
- 8,080
- 1
- 30
- 42