7

Using the IO tools in pandas it is possible to convert a DataFrame to an in-memory feather buffer:

import pandas as pd  
from io import BytesIO 

df = pd.DataFrame({'a': [1,2], 'b': [3.0,4.0]})  

buf = BytesIO()

df.to_feather(buf)

However, using the same buffer to convert back to a DataFrame

pd.read_feather(buf)

Results in an error:

ArrowInvalid: Not a feather file

How can a DataFrame be convert to an in-memory feather representation and, correspondingly, back to a DataFrame?

Thank you in advance for your consideration and response.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125
  • @EdChum The documentation explicitly named the variable as `path` which would indicate it was purposeful since all of the other methods name the variable `filepath_or_buffer`. – Ramón J Romero y Vigil Jun 08 '18 at 13:37
  • Hmm, could you try `buf = io.BytesIO()` – EdChum Jun 08 '18 at 13:42
  • @EdChum That seems to have worked! – Ramón J Romero y Vigil Jun 08 '18 at 13:45
  • Looking at the impl it accepts a file path, so it will also accept a file like object so I tried `buf = io.BytesIO()` but I don't have `feather-format` library installed so just waiting for `pip` to complete before confirming – EdChum Jun 08 '18 at 13:47
  • This does seem to work but I'm not familiar with feather files so can't confirm if all is OK – EdChum Jun 08 '18 at 13:59
  • @EdChum I tried to verify by converting the feather back to a dataframe but got another error. Updated the question accordingly. – Ramón J Romero y Vigil Jun 08 '18 at 14:18
  • I get the same problem I've not investigated how to convert the bytes object to a file like object so that pandas can read it again. – EdChum Jun 08 '18 at 14:20
  • I think this maybe something to ask on [github](https://github.com/pandas-dev/pandas/issues) as it maybe functionality that could be added – EdChum Jun 11 '18 at 08:37

1 Answers1

9

With pandas==0.25.2 this can be accomplished in the following way:

import pandas
import io
df = pandas.DataFrame(data={'a': [1, 2], 'b': [3.0, 4.0]})
buf = io.BytesIO()
df.to_feather(buf)
output = pandas.read_feather(buf)

Then a call to output.head(2) returns:

    a    b
 0  1  3.0
 1  2  4.0

Note that you could do the same with csv files, but would require you to use StringIO instead of BytesIO


If you have a DataFrame with multiple indexes, you may see an error like

ValueError: feather does not support serializing <class 'pandas.core.indexes.base.Index'> for the index; you can .reset_index()to make the index into column(s)

In which case you need to call .reset_index() before to_feather, and call .set_index([...]) after read_feather


Last thing I would like to add, is that if you are doing something with the BytesIO, you need to seek back to 0 after writing the feather bytes. For example:

buffer = io.BytesIO()
df.reset_index(drop=False).to_feather(buffer)
buffer.seek(0)
s3_client.put_object(Body=buffer, Bucket='bucket', Key='file')
luksfarris
  • 1,313
  • 19
  • 38