I was using this code to process an excel file in Python on my local machine where input_dir
was my input directory and file
was just the file I wanted to grab from that directory:
input_file = input_dir + file
def excel_to_csv(input_file):
#open workbook and store excel object
excel = openpyxl.load_workbook(input_file)
#select active sheet
sheet = excel["PUBLISH"]
#create writer object
col = csv.writer(open("tt.csv",'w', newline=""))
#write data to csv
for r in sheet.rows:
col.writerow([cell.value for cell in r])
#Convert CSV to dataframe
excel_to_csv(input_file)
jpm = pd.DataFrame(pd.read_csv("tt.csv", header = 11, usecols = [*range(1,16)]))
However when I tried to migrate this to AWS using an S3 bucket as the source directory, the code fails. I know it is because I need to use an io.BytesIO
object to accomplish this, but I am very much unversed in AWS and am not sure how to use openpyxl
and csv.writer
in the AWS environment.