Python: Read CSV file without Pandas from S3 bucket

Question

My goal is to access an csv file that's located on a S3 bucket. The file has the following columns: event_id, ds, yhat, yhat_lower, yhat_upper. I found the following example here:

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))

However, what's not solved here is how to apply that directly on an S3 bucket. That's my code how I currently try to access the file:

BUCKET_NAME = 'fbprophet'
FORECAST_DATA_OBJECT = 'forecast.csv'
s3 = boto3.client(
    's3',
    aws_access_key_id=settings.ML_AWS_ACCESS_KEY_ID,
    aws_secret_access_key=settings.ML_AWS_SECRET_ACCESS_KEY,
)
# 's3' is a key word. create connection to S3 using default config and all buckets within S3
csv_obj = s3.get_object(Bucket=BUCKET_NAME, Key=FORECAST_DATA_OBJECT)

Update:

obj = s3.get_object(Bucket=BUCKET_NAME, Key=FORECAST_DATA_OBJECT)
data = obj['Body'].read()

spamreader = csv.reader(data, delimiter=' ', quotechar='|')
for row in spamreader:
    print(', '.join(row))

well ... what _is_ `csv_obj`? a file stream? if so- `spamreader = csv.reader(csv_obj, delimiter=' ', quotechar='|')` _could_ work... — Patrick Artner, Sep 07 '19 at 18:20
I tried that but I get `TypeError: expected str, bytes or os.PathLike object, not dict` — Joey Coder, Sep 07 '19 at 18:25
`spamreader = csv.reader(io.BytesIO(obj['Body'].read()), delimiter=' ', quotechar='|')` - see the dupe reading an excel file from S3 - same difference. pandas takes a stream as well - same as `csv.reader( stream, ...)` - needs an `import io` — Patrick Artner, Sep 07 '19 at 18:31
I now tried several iterations and variants. The closest I could get is the *Update* in my original post. However, what I don't understand is that I get an error but then it prints the content anyway in my console: `Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: b',event_id,ds,yhat,yhat_lower,yhat_upper\n0,277,2019-09-04 07:14:08.051643,0.3054256311115928,0.29750667741533227,0.31441960581142636\n'` Starting with `event_id` [...] that's actually the content in the file. — Joey Coder, Sep 07 '19 at 18:53
I thought the same but I work with the Django manage.py shell. And I just restarted it and only worked with these few lines. It seems to be a problem of the combination of how I try to access the data. — Joey Coder, Sep 07 '19 at 19:09
`csv.reader(io.BytesIO(csvfile), delimiter=' ', quotechar='|')` - why `csvfile` - use `data` instead and without `with open(data, newline='') as csvfile:` ... — Patrick Artner, Sep 07 '19 at 19:16
Changed it to that (see Update) but now I get `_csv.Error: iterator should return strings, not int (did you open the file in text mode?)`. Sorry, it seems I am really stuck here. — Joey Coder, Sep 07 '19 at 19:31
None of the above solutions worked for me. I keep getting "can only concatenate str (not "list") to str" when I use "csv.reader()". I tested with all the solutions mentioned above and get same error. — Shankar Naru, Jul 16 '20 at 18:43

Python: Read CSV file without Pandas from S3 bucket

0 Answers0