2

I'm trying to load xlsx files using pyodide. I able to load xls files correctly using the following code:

response = await fetch('${fileUrl}')
js_buffer = await response.arrayBuffer()
dFrame = pd.read_excel(BytesIO(js_buffer.to_py()))

However, it fails when I try to pass an xlsx file. I am not sure what's causing it. Here are potential issues that I already struck-out after testing:

  1. "openpyxl" not properly loaded - I used micropip to load openpyxl, and pd.read_excel('filename.xlsx') works when I manually place a valid xlsx file in the memory. So this isn't the issue.

Thanks for the help.

ste_kwr
  • 820
  • 1
  • 5
  • 21
  • 1
    Do you mean when you pass the URL of the file directly to `pd.read_excel`? – rth Nov 15 '21 at 16:53
  • Neither works. Passing the URL, or passing the above function. Even passing a remote online URL doesn't work. – ste_kwr Nov 15 '21 at 17:09
  • do you know where it fails ? The file gets downloaded and can be seen in the pyodide file system ? Then check the version of openpyxl maybe ? – Sylvain Nov 15 '21 at 20:24
  • It fails in the very last step of loading the file, though I was able to load the file directly using openpyxl and then convert it to dataframe which works. Added solution below. – ste_kwr Nov 16 '21 at 04:19

1 Answers1

2

So, I'm still not sure why directly read_excel or Excelfile functions don't work (especially since they just call openpyxl anyway), but I was able to get it to work by opening the file using openpyxl and then converting it to a dataframe. Working code below:

      response = await fetch('${fileUrl}')
      js_buffer = await response.arrayBuffer()
      wb = openpyxl.load_workbook(BytesIO(js_buffer.to_py()), data_only = True)
      ws = wb.worksheets[0]
      excel_data = ws.values
      columns = next(excel_data)[0:]
      dFrame = pd.DataFrame(excel_data , columns=columns)
ste_kwr
  • 820
  • 1
  • 5
  • 21