If I write a simple Parquet file using the script simple-write-parquet.cpp, I expect to have a simple Parquet file with a single column MyInt
. The script simple-write-parquet.cpp attempts to add KeyValueMetadata
to the field MyInt
with some dummy values. In the C++ code, if I do,
std::cout << field->ToString(true) << std::endl;
I see the expected return.
...
-- metadata --
foo: bar
bar: foo
and I expect that this metadata will be preserved in the output Parquet file.
However, when I attempt to read this file back using pyarrow
, this field metadata key-value pair does not seem to exist:
import pyarrow as pa
import pyarrow.parquet as pq
table = pq.read_table("test.parquet")
field = table.field("MyInt")
field.metadata # None!
Is there a way to retrieve from within pyarrow
the KeyValueMetadata
attached to both fields and schema (e.g. via the WithMetadata
methods) from the C++ side writing out the Parquet files to disk?