0

I am running a Google cloud function, which accesses the google cloud storage bucket for an excel file. My goal is to read the file and do some calculations with it, but is it possible without downloading the file to the /tmp folder? Things I have tried:

  storage_client = storage.Client()
  file_obj = storage_client.get_bucket(‘bucketname’).get_blob('filename.xlsx')
  
  excel_file = None
  file_obj.download_to_file(excel_file)
  wb = openpyxl.load_workbook(excel_file)

at first I thought I could attain a file object, but then after I read the error message I realised I was being asked for a file path so I would have to download to the /tmp folder, which is something I would like to avoid.

I also tried download_as_bytes(), but unfortunately openpyxl cannot read bytes.

Any help/hint would be appreciated :)

  • download_as_string? – guillaume blaquiere Jun 03 '21 at 18:31
  • @guillaumeblaquiere ummm... As I read from the documentation, it's an deprecated alias for download_as_bytes() https://googleapis.dev/python/storage/latest/blobs.html I tried nevertheless, and it's the same result as download_as_bytes: InvalidFileException: openpyxl does not support b'.xmlpk\x05\x06\x00\x00\x00\x00\x0c\x00\x0c\x00&\x03\x00\x00\x15\x97\x00\x00\x00\x00' file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm – user9882001 Jun 04 '21 at 06:49
  • Ok, and not to convert to string your byte and then load it? – guillaume blaquiere Jun 04 '21 at 06:56
  • @guillaumeblaquiere sorry... I don't understand what you mean... I didn't do any conversion when I do download_as_string or download_as_bytes – user9882001 Jun 04 '21 at 07:33
  • If you have bytes, you can encode("UTF-8") them to create a string – guillaume blaquiere Jun 04 '21 at 07:38
  • I see what you mean. I tried excel_file = file_obj.download_as_string().decode("utf-8") too but that doesn't work as well(unfortunately I didn't keep that error message, so I didn't know exactly the issue now...) – user9882001 Jun 04 '21 at 07:47
  • 1
    have you checked this [community answer](https://stackoverflow.com/a/20667890/12857703)? It explains that `openpyxl` expects a path or takes a file-like object, so if you have a bytes you can follow the example in that answer to your case. Do you think this will work for you case? – Ralemos Jun 04 '21 at 12:09
  • Dear Rafael: thank you! It worked perfectly. Please leave an answer below so I can accept the answer(and would be easier for other people to read it) – user9882001 Jun 08 '21 at 08:17
  • Awesome, I will add an answer :) – Ralemos Jun 08 '21 at 11:35

1 Answers1

1

As you can see in this Community Answer:

In the docs for openpyxl.load_workbook it says:

#:param filename: the path to open or a file-like object

So if you have bytes as the input you can assemble a "proto-object" to satisfy the parameter requirements of openpyxl.load_workbook and it will work, so like the example below:

from io import BytesIO
...
excel_file = None
file_obj.download_as_bytes(excel_file)
wb = load_workbook(filename=BytesIO(excel_file.read()))
Ralemos
  • 5,571
  • 2
  • 9
  • 18