I am trying to read and write a trivial dataset into Julia. The dataset is mtcars
, taken from R, with an arbitrarily added column bt
with random Boolean values. The file/folder structure (below) was written out using the R arrow
package.
The files are laid out as follows:
arr
|-- bt=false
| `-- part-1.arrow
`-- bt=true
`-- part-0.arrow
How can I faithfully reproduce the original table in Julia?
What I've tried so far:
Using the
Parquet.jl
package. Documentation suggests that it should automatically detect partitioning folder structure for columns of bool/string/date type. When I read the data in, usingread_parquet(path; kwargs)
, the resulting data structure does not have thebt
column. I've tried setting thecolumn_generator
keyword argument to the defaultParquet.dataset_column_generator
but this did not work.Using
Arrow.jl
- I cannot find a documented way (unless I misunderstood) to directly read in a partitioned data structure.
R does not generate additional metadata files to store the schema, but I understand this is optional and not part of the arrow spec?