Issue with writing Parquet Files via Arrow Package in R

Question

Just wondering if there's a difference in the read/write parquet function from the arrow package in R when running in Windows vs Linux OS?

Example code(insert anything in dataframe):

mydata = data.frame(...)

write_parquet(mydata, 'mydata.parquet')

read_parquet('mydata.parquet')

I'm noticing when this code is ran in Windows the parquet files can be read with no problems in either Windows or Linux, and returns a dataframe in R. But when the write parquet code is ran in Linux, and afterwards if I try to read these parquet files in R in Windows it does not return a dataframe but rather a grouped list (each vector in the grouped list contains the data for that respective column). Initially I tried doing a workaround with do.call(rbind...) to convert the grouped list back into a dataframe, but it does not contain any of the column names.

Please let me know if there are any ways to resolve this. Ideally I'd like to be able to write parquet files and be able to read them back into R as dataframes from either OS. For reference I'm on R4.0 on both OS.

Thanks in advance.

Can you provide a minimal example parquet file that reproduces this behavior? If so, please make an issue at https://issues.apache.org/jira/browse/ARROW and attach it. — Neal Richardson, Jan 20 '21 at 19:47
I had a similar issue - a data.table object was saved using arrow version 2.0.0. When it was read back with the same version of arrow, all is well. When I used arrow version 1.0.0 to read it, the column names were gone - i.e., the 'names' attribute was NULL. Could it be that (1) versions 1.0.0 and 2.0.0 of arrow are not compatible, and (2) the versions of arrow on the Linux machine is different than that of the Windows machine? — amitr, Jul 08 '21 at 13:19
I've also had problems on a mac. Once in a while, when I write a parquet file, I'm not able to read it 5 minutes later. My files are usually data.tables and the problem isn't related to the size of the dataset. I wrote to the maintainer, but didn't get a response — David F, Feb 08 '22 at 15:07

Issue with writing Parquet Files via Arrow Package in R

0 Answers0

Linked