Slow reading from AWS S3 bucket

Question

I'm trying to read a file with pandas from an s3 bucket without downloading the file to the disk. I've tried to use boto3 for that as

import boto3

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket_name', Key="key")
read_file = io.BytesIO(obj['Body'].read())
pd.read_csv(read_file)

And also I've tried s3fs as

import s3fs
import pandas as pd

fs = s3fs.S3FileSystem(anon=False)
with fs.open('bucket_name/path/to/file.csv', 'rb') as f:
    df = pd.read_csv(f)`

The issue is it takes too long to read the file. It takes about 3 minutes to read 38MB file. Is it supposed to be like that? If it is, then is there any faster way to do the same. If it's not, any suggestions what might cause the issue?

Thanks!

score 3 · Accepted Answer · answered May 01 '18 at 13:50

Based on this answer to a similar issue, you might want to consider what region the bucket you're reading from is in, compared to where you're reading it from. Might be a simple change (assuming you have control over the buckets location) which could improve the performance drastically.

Slow reading from AWS S3 bucket

1 Answers1