7

I have a data frame, let's say:

import pandas as pd
df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})

I want to save it as a feather file to s3 but I can't find a working way to do it.

I tried to use s3bp and s3fs but they don't do the trick.

Any suggestion?

Community
  • 1
  • 1
amarchin
  • 2,044
  • 1
  • 16
  • 32

3 Answers3

5

The solution that worked for me is

import boto3
import pandas as pd

from io import BytesIO
from pyarrow.feather import write_feather

df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})

s3_resource = boto3.resource('s3')
with BytesIO() as f:
    write_feather(df, f)
    s3_resource.Object('bucket-name', 'file_name').put(Body=f.getvalue())
amarchin
  • 2,044
  • 1
  • 16
  • 32
0

You can use storefact / simplekv for this without writing to disk.

import pyarrow as pa
from pyarrow.feather import write_feather
import storefact

df = …
store = storefact.get_store('hs3', host="…", bucket="…", access_key="…", secret_key="…")
buf = pa.BufferOutputStream()
write_feather(df, buf)
storage.put('filename.feather', buf.get_result().to_pybytes())
Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42
0

Simple solution with just Pyarrow and Pandas

import pandas as pd
import pyarrow as pa

s3 = pa.fs.S3FileSystem(region='us-east-1')

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

with s3.open_output_stream('my-bucket/path/to.feather') as f:
   pa.feather.write_feather(df, f)