0

I use apache parquet to create Parquet tables with process information of a machine and I need to store file wide metadata (Machine ID and Machine Name).

It is stated that parquet files are capable of storing file wide metadata, however i couldn't find anything in the documentation about it.

There is another stackoverflow post that tells how it is done with pyarrow. As far as the post is telling, i need some kind of key value pair (maybe map<string, string>) and add it to the schema somehow.

I Found a class inside the parquet source code that is called parquet::FileMetaData that may be used for this purpose, however there is nothing in the docs about it.

Is it possible to store file-wide metadata with c++ ?

Currently i am using the stream_reader_writer example for writing parquet files

  • See this question https://stackoverflow.com/questions/68778638/ – 0x26res Dec 02 '21 at 10:13
  • Thanks for the comment, however the question of yours is not the answer to my problem. It only shows how to store metadata for a field. What i want to do is to save file-wide metadata that isn't connected to a specific field. – globetrotter Dec 02 '21 at 11:14
  • It's not that much different, you can assign metadata to the schema using https://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv46schemaNSt6vectorINSt10shared_ptrI5FieldEEEENSt10shared_ptrIK16KeyValueMetadataEE – 0x26res Dec 02 '21 at 11:45
  • The Problem is, that i use parquet::schema::GroupNode::Make() for Creating my schema (see: [stream_reader_writer_example](https://github.com/apache/arrow/blob/master/cpp/examples/parquet/parquet_stream_api/stream_reader_writer.cc) ), and there is no such field called metadata. – globetrotter Dec 02 '21 at 13:12
  • You can pass the meta data when calling "parquet::ParquetFileWriter::Open" https://github.com/apache/parquet-cpp/blob/master/src/parquet/file_writer.h#L149 – 0x26res Dec 02 '21 at 13:22
  • 1
    Thanks for the solution, this is working. It's sad that these Classes/Functions aren't documented in any way. if you may write an answer, then i can select it as solution and can select it as useful. – globetrotter Dec 03 '21 at 14:18

1 Answers1

0

You can pass the file level metadata when calling parquet::ParquetFileWriter::Open, see the source code here

0x26res
  • 11,925
  • 11
  • 54
  • 108