2

I'm looking for the proper way to write data to a Parquet file in Cpp/C++. It seems like there are two choices: either writing direct to Parquet or writing to Arrow then Parquet.

Is writing to Arrow then converting to Parquet with WriteTable preferred? Would either performance considerations or ease of use drive one to write directly to Parquet with the ParquetFileWriter or some other tool?

Looking first at the code it seemed like the ParquetFileWriter was the proper bet. But the usage in the unittest seemed clunky.

Then I found the docs which say to use the WriteTable free fn. WriteTable takes an Apache Arrow Table so it seems I must write to that first. I was taken aback at first because then I must open the lid on Arrow.

user2183336
  • 706
  • 8
  • 19
  • Arrow is an in-memory format. Parquet is an on-disk format. You must first convert your data to an Arrow `Table` (or `RecordBatch`) and then write that to the disk using the `ParquetFileWriter`. What format is your data in now? There may be utilities available to convert from your current format to `Table`. – Pace Apr 12 '21 at 23:26
  • The data is just in a C++ class. Not sure one would say it currently exists in a format. – user2183336 Apr 13 '21 at 13:15
  • I wrote such data without Arrow in C++ using the StreamWriter, see e.g. https://gitlab.ikp.kit.edu/AirShowerPhysics/corsika/-/blob/output_format_testing/Processes/ObservationPlane/ObservationPlaneParquet.cc This works very well, also in terms of performance. – Ralf Ulrich Apr 13 '21 at 17:11
  • 1
    There are examples of the low level API (examples: https://github.com/apache/arrow/tree/master/cpp/examples/parquet/low_level_api). You can also use the streaming API (what Ralf used, examples: https://github.com/apache/arrow/tree/master/cpp/examples/parquet/parquet_stream_api). Both of these should be ways to create parquet files without going to Arrow format first. However, if you were to convert from your format to the Arrow format you would also be able to write Arrow IPC (feather) and CSV (CSV writer coming in 4.0.0). So you have some choices. – Pace Apr 13 '21 at 18:38

0 Answers0