I'm trying to read a file with pandas from an s3 bucket without downloading the file to the disk. I've tried to use boto3 for that as
import boto3
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket_name', Key="key")
read_file = io.BytesIO(obj['Body'].read())
pd.read_csv(read_file)
And also I've tried s3fs as
import s3fs
import pandas as pd
fs = s3fs.S3FileSystem(anon=False)
with fs.open('bucket_name/path/to/file.csv', 'rb') as f:
df = pd.read_csv(f)`
The issue is it takes too long to read the file. It takes about 3 minutes to read 38MB file. Is it supposed to be like that? If it is, then is there any faster way to do the same. If it's not, any suggestions what might cause the issue?
Thanks!