0

I am using api gateway proxy for s3 to read feather files. Below is the simplest form of the code I am using.

import pandas as pd

s3_data=pd.read_feather('https://<api_gateway>/<bucket_name/data.feather>')

This gives an error -

   reader = _feather.FeatherReader(source, use_memory_map=memory_map)
  File "pyarrow\_feather.pyx", line 75, in pyarrow._feather.FeatherReader.__cinit__
  File "pyarrow\error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 114, in pyarrow.lib.check_status
OSError: Verification of flatbuffer-encoded Footer failed.

If I keep the feather file on my local and read it like below, all works fine.

s3_data=pd.read_feather("file://localhost//C://Users//<Username>//Desktop//data.feather")

How do I make this work ?

Naxi
  • 1,504
  • 5
  • 33
  • 72

1 Answers1

0

Maybe the gateway proxy needs to do some redirection, that makes it fail. I would have done something like this:

from s3fs import S3FileSystem

fs = S3FileSystem(anon=True)
with fs.open("<bucket>/data.feather") as f:
    df = pd.read_feather(f)

s3fs is part of Dask. There are also other similar layers that you can use.

PS: if you are using feather for long term data storage, the Apache Arrow project advises against it (maintainer of feather). You should probably use parquet.

suvayu
  • 4,271
  • 2
  • 29
  • 35