0

I am trying to load an xls file and convert to xlsx from an Amazon S3 bucket.

really appreciate if someone knows how to do:

  1. read xls file from s3
  2. convert xls to xlsx and save in s3.

Constrains: 1.This is an enterprise projet thus cannot be downloading the file to my local and convert and reupload 2. our internal application is running on EC2 linux, thus cannot install packages that works for windows like win32.client to do the conversion.

thank you!!

  • 1
    Why you try to open `s3_client` in `workbook = xlrd.open_workbook(s3_client)`? ;) – droebi Aug 30 '21 at 15:08
  • thank you @droebi, I have no experince on how to do this. could you provide some suggestion on how to open s3 xls file? what should be the correct tool? thanks! – Dwight schrute Aug 30 '21 at 15:10
  • 2
    Pass in the `binary_data` you're reading from S3 instead of the S3 client `s3_client` itself. – Anon Coward Aug 30 '21 at 15:11
  • thank you @AnonCoward, I tried to put the binary_data but got the error complaining [Errno 36] File name too long: b' I think it is taking the whole file as the file name – Dwight schrute Aug 30 '21 at 15:16
  • 1
    Possible duplicate https://stackoverflow.com/questions/20635778/using-openpyxl-to-read-file-from-memory – Mark B Aug 30 '21 at 15:24
  • HI @MarkB thank you for your help. I did try this function def load_excel_workbook(s3_bucket, s3_key): s3_client = boto3.client('s3', region_name='us-east-1') obj = s3_client.get_object(Bucket=s3_bucket, Key=s3_key) binary_data = obj['Body'].read() workbook = openpyxl.load_workbook(BytesIO(binary_data)) return workbook load_excel_workbook(s3_bucket,s3_key) but it only works for xlsx ( if someone want to load xlsx this works) for xls it return this error:' zipfile.BadZipFile: File is not a zip file – Dwight schrute Aug 30 '21 at 15:49
  • Please provide enough code so others can better understand or reproduce the problem. – Community Sep 01 '21 at 16:28

1 Answers1

1

Try following: When you run it inside AWS: Python: How to read and load an excel file from AWS S3?

When your run it local:

  1. Download this file:
  1. Convert xls to xlsx:

     def xls_to_xlsx(*args, **kw):
         book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
         book_xlsx = openpyxl.workbook.Workbook()
    
         sheet_names = book_xls.sheet_names()
         for sheet_index in range(len(sheet_names)):
             sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
             if sheet_index == 0:
                 sheet_xlsx = book_xlsx.active
                 sheet_xlsx.title = sheet_names[sheet_index]
             else:
                 sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
             for crange in sheet_xls.merged_cells:
                 rlo, rhi, clo, chi = crange
                 sheet_xlsx.merge_cells(start_row=rlo + 1, end_row=rhi,
                                        start_column=clo + 1, end_column=chi, )
    
             def _get_xlrd_cell_value(cell):
                 value = cell.value
                 if cell.ctype == xlrd.XL_CELL_DATE:
                     if value == 1.0:
                         datetime_tup = (0, 0, 0)
                     else:
                         datetime_tup = xlrd.xldate_as_tuple(value, 0)
    
                     if datetime_tup[0:3] == (0, 0, 0):
                         value = datetime.time(*datetime_tup[3:])
                     else:
                         value = datetime.datetime(*datetime_tup)
                 return value
    
             for row in range(sheet_xls.nrows):
                 sheet_xlsx.append((
                     _get_xlrd_cell_value(cell)
                     for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
                 ))
    
         return book_xlsx
    
  2. Upload it back to S3:

droebi
  • 872
  • 1
  • 14
  • 27
  • doesnt AWS has internal function to convert file types? – Dwight schrute Aug 30 '21 at 16:39
  • thank you Droebi, one issue I am limited is that we are running the program on EC2 instance, this is enterprise project and I think it will be hard to down to my local and reupload. – Dwight schrute Aug 30 '21 at 16:41
  • I edited my answer maybe the first link helps you, if not leave a comment I will try to help you to solve this... – droebi Aug 30 '21 at 17:58