11

I have uploaded an excel file to AWS S3 bucket and now I want to read it in python. Any help would be appreciated. Here is what I have achieved so far,

import boto3
import os

aws_id = 'aws_id'
aws_secret = 'aws_secret_key'

client = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
bucket_name = 'my_bucket'
object_key = 'my_excel_file.xlsm'
object_file = client.get_object(Bucket=bucket_name, Key=object_key)
body = object_file['Body']
data = body.read()

What do I need to do next in order to read this data and work on it?

exan
  • 3,236
  • 5
  • 24
  • 38

4 Answers4

25

Spent quite some time on it and here's how I got it working,

import boto3
import io
import pandas as pd
import json

aws_id = ''
aws_secret = ''
bucket_name = ''
object_key = ''

s3 = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
obj = s3.get_object(Bucket=bucket_name, Key=object_key)
data = obj['Body'].read()
df = pd.read_excel(io.BytesIO(data), encoding='utf-8')
exan
  • 3,236
  • 5
  • 24
  • 38
  • 3
    I tried the above, but received this error: ```TypeError: expected str, bytes or os.PathLike object, not NoneType```. Any clue? – SaTa Mar 14 '19 at 18:52
  • @exan how can this be done for multiple sheets / specific sheet? – Mohseen Mulla Apr 20 '22 at 12:11
  • 1
    I tried the above, but it seems read_excel has recently changed its inputs and that `encoding` is not allowed anymore. I got: `TypeError: read_excel() got an unexpected keyword argument 'encoding'` – Revolucion for Monica Aug 05 '22 at 16:14
  • 1
    Encoding is not a valid argument to read_excel. This could be a previous version solution. – abhishek kumar Jan 10 '23 at 07:08
4

You can directly read excel files using awswrangler.s3.read_excel. Note that you can pass any pandas.read_excel() arguments (sheet name, etc) to this.

import awswrangler as wr
df = wr.s3.read_excel(path=s3_uri)
milihoosh
  • 547
  • 5
  • 10
2

You can directly read xls file from S3 without having to download or save it locally. xlrd module has a provision to provide raw data to create workbook object. Following is the code snippet.

from boto3 import Session  
from xlrd.book import open_workbook_xls  

aws_id = ''    
aws_secret = ''
bucket_name = ''
object_key = ''

s3_session = Session(aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
bucket_object = s3_session.resource('s3').Bucket(bucket_name).Object(object_key)
content = bucket_object.get()['Body'].read()
workbook = open_workbook_xls(file_contents=content)
Rhythm Chopra
  • 103
  • 3
  • 9
-3

Python doesn't support excel files natively. You could use the pandas library pandas library read_excel functionality

sshevlyagin
  • 1,310
  • 2
  • 16
  • 26
  • Apparently files reterived from S3 using boto3 library are encoded in some particular format. I am not able to read them in using the read_excel functions without knowing the decoding format – exan Nov 23 '18 at 04:29
  • @exan Is that with the additional library? If so, please edit your question to show your updated code. (S3 does not change file contents.) – John Rotenstein Nov 23 '18 at 05:23
  • @JohnRotenstein: Not sure what you mean mate, can you kindly elaborate a bit. I have uploaded a solution though. – exan Nov 23 '18 at 05:52