How to properly implement openpyxl and csvwriter in AWS with S3 bucket as input directory

Question

I was using this code to process an excel file in Python on my local machine where input_dir was my input directory and file was just the file I wanted to grab from that directory:

input_file = input_dir + file

def excel_to_csv(input_file):    
    #open workbook and store excel object
    excel = openpyxl.load_workbook(input_file)

    #select active sheet
    sheet = excel["PUBLISH"]

    #create writer object
    col = csv.writer(open("tt.csv",'w', newline=""))

    #write data to csv
    for r in sheet.rows:
        col.writerow([cell.value for cell in r])
#Convert CSV to dataframe
excel_to_csv(input_file)        
jpm = pd.DataFrame(pd.read_csv("tt.csv", header = 11, usecols = [*range(1,16)]))

However when I tried to migrate this to AWS using an S3 bucket as the source directory, the code fails. I know it is because I need to use an io.BytesIO object to accomplish this, but I am very much unversed in AWS and am not sure how to use openpyxl and csv.writer in the AWS environment.

How to properly implement openpyxl and csvwriter in AWS with S3 bucket as input directory

0 Answers0