How to write a pandas dataframe to .arrow file

Question

How can I write a pandas dataframe to disk in .arrow format? I'd like to be able to read the arrow file into Arquero as demonstrated here.

score 7 · Answer 1 · answered Nov 02 '20 at 15:58

7

Since Feather is the Arrow IPC format, you can probably just use write_feather. See http://arrow.apache.org/docs/python/feather.html

answered Nov 02 '20 at 15:58

Neal Richardson

792
3
3

Interesting - I will try this. I knew they were very similar! – RobinL Nov 02 '20 at 16:15
Thank you. I can confirm that `feather.write_feather(table, 'file.feather', compression='uncompressed')` works with Arquero, as well as saving to `arrow` using `pa.ipc.new_file`. The uncompressed feather file is about 10% larger on disk than the `.arrow` file. A compressed feather file cannot be read using the same methodology: https://observablehq.com/d/298f76ea5f91b5fe which is taken from https://observablehq.com/@uwdata/arquero-and-apache-arrow I've uploaded the files here: https://github.com/RobinL/arrow_test – RobinL Nov 02 '20 at 17:54
2

Correct, the JS implementation of Arrow hasn't added support for compressed feather/arrow files, so you'll need to write them uncompressed. – Neal Richardson Nov 03 '20 at 17:49
Cheers guys - couldn't find anything in the arrow docs, notebooks or mailing lists on this, is there anywhere? I'm assuming this is related to the lack of browser support for javascript compression/decompression. – nite Dec 15 '20 at 21:54

score 4 · Answer 2 · edited Jun 09 '22 at 07:46

4

You can do this as follows:

import pyarrow
import pandas

df = pandas.read_parquet('your_file.parquet')

schema = pyarrow.Schema.from_pandas(df, preserve_index=False)
table = pyarrow.Table.from_pandas(df, preserve_index=False)

sink = "myfile.arrow"

# Note new_file creates a RecordBatchFileWriter 
writer = pyarrow.ipc.new_file(sink, schema)
writer.write(table)
writer.close()

edited Jun 09 '22 at 07:46

buhtz

10,774
18
76
149

answered Nov 01 '20 at 07:40

RobinL

11,009
8
48
68

score 1 · Answer 3 · answered Nov 02 '22 at 04:02

Pandas can directly write a DataFrame to the binary Feather format. (uses pyarrow)

import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
df.to_feather('my_data.arrow')

Additional keywords are passed to pyarrow.feather.write_feather(). This includes the compression, compression_level, chunksize and version keywords.

How to write a pandas dataframe to .arrow file

3 Answers3