How can I write a pandas dataframe to disk in .arrow
format? I'd like to be able to read the arrow file into Arquero as demonstrated here.
Asked
Active
Viewed 6,422 times
4

RobinL
- 11,009
- 8
- 48
- 68
3 Answers
7
Since Feather is the Arrow IPC format, you can probably just use write_feather
. See http://arrow.apache.org/docs/python/feather.html

Neal Richardson
- 792
- 3
- 3
-
Interesting - I will try this. I knew they were very similar! – RobinL Nov 02 '20 at 16:15
-
Thank you. I can confirm that `feather.write_feather(table, 'file.feather', compression='uncompressed')` works with Arquero, as well as saving to `arrow` using `pa.ipc.new_file`. The uncompressed feather file is about 10% larger on disk than the `.arrow` file. A compressed feather file cannot be read using the same methodology: https://observablehq.com/d/298f76ea5f91b5fe which is taken from https://observablehq.com/@uwdata/arquero-and-apache-arrow I've uploaded the files here: https://github.com/RobinL/arrow_test – RobinL Nov 02 '20 at 17:54
-
2Correct, the JS implementation of Arrow hasn't added support for compressed feather/arrow files, so you'll need to write them uncompressed. – Neal Richardson Nov 03 '20 at 17:49
-
Cheers guys - couldn't find anything in the arrow docs, notebooks or mailing lists on this, is there anywhere? I'm assuming this is related to the lack of browser support for javascript compression/decompression. – nite Dec 15 '20 at 21:54
4
You can do this as follows:
import pyarrow
import pandas
df = pandas.read_parquet('your_file.parquet')
schema = pyarrow.Schema.from_pandas(df, preserve_index=False)
table = pyarrow.Table.from_pandas(df, preserve_index=False)
sink = "myfile.arrow"
# Note new_file creates a RecordBatchFileWriter
writer = pyarrow.ipc.new_file(sink, schema)
writer.write(table)
writer.close()
1
Pandas can directly write a DataFrame to the binary Feather format. (uses pyarrow)
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
df.to_feather('my_data.arrow')
Additional keywords are passed to pyarrow.feather.write_feather(). This includes the compression, compression_level, chunksize and version keywords.

ns15
- 5,604
- 47
- 51