Using openpyxl to read file from memory

Question

I downloaded a google-spreadsheet as an object in python.

How can I use openpyxl use the workbook without having it to save to disk first?

I know that xlrd can do this by:

book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read())

with "downloaded_spreadsheet" being my downloaded xlsx-file as an object.

Instead of xlrd, I want to use openpyxl because of better xlsx-support(I read).

I'm using this so far...

#!/usr/bin/python

    import openpyxl
    import xlrd
    # which to use..?


import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
        "Email": email, "Passwd": password,
        "service": service,
        "accountType": "HOSTED_OR_GOOGLE",
        "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="xls"):

        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
        "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
        "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":



    email = "........@gmail.com" # (your email here)
    password = '.....'
    spreadsheet_id = "......" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    downloaded_spreadsheet = gs.download(ss)


    # book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read(), formatting_info=True)

    #It works.. alas xlrd doesn't support the xlsx-funcionality that i want...
    #i.e. being able to read the cell-colordata..

I hope anyone can help because I'm struggling for months to get the color-data from given cell in google-spreadsheet. (I know the google-api doesn't support it..)

score 98 · Answer 1 · edited Dec 06 '16 at 18:13

98

In the docs for load_workbook it says:

#:param filename: the path to open or a file-like object

..so it was capable of it all the time. It reads a path or takes a file-like object. I only had to convert my file-like object returned by urlopen, to a bytestream with:

from io import BytesIO
wb = load_workbook(filename=BytesIO(input_excel.read()))

and I can read every piece of data in my Google-spreadsheet.

edited Dec 06 '16 at 18:13

That1Guy

7,075
4
47
59

answered Dec 18 '13 at 20:36

Kaspar128

1,561
1
10
8

+1 - Made a similar mistake. I read only the first half and thought it can only read files. Now I went back and read it completely and saw that it can do file-like objects as well. – Karthic Raghupathi Sep 07 '14 at 05:39

score 17 · Answer 2 · edited Nov 07 '20 at 08:44

17

I was looking to load a file from an URL and here is what I came up with:

util:

from openpyxl import load_workbook
from io import BytesIO
import urllib

def load_workbook_from_url(url):
    file = urllib.request.urlopen(url).read()
    return load_workbook(filename = BytesIO(file))

usage:

import openpyxl_extended

book = openpyxl_extended.load_workbook_from_url('https://storage.googleapis.com/pnbx-cdn/pen-campaign/campaigner-template-fr.xlsx')

edited Nov 07 '20 at 08:44

Dharman

30,962
25
85
135

answered Nov 07 '20 at 08:38

Emile Fyon

171
1
2

Great answer, clear and reusable. – David Dec 30 '21 at 22:20

score -15 · Answer 3 · answered Dec 16 '16 at 08:20

-15

Actually enough is to:

file = open('path/to/file.xlsx', 'rb')
wb = openpyxl.load_workbook(filename=file)

and it will work. No need for BytesIO and stuff.

answered Dec 16 '16 at 08:20

PerBeatus

173
1
5

8

It's not being read from the file system as the question indicates. It's a stream. – swade Mar 15 '17 at 13:48
1

This would read from a file saved in the disk, not from memory. – FedericoG Aug 17 '18 at 13:27

Using openpyxl to read file from memory

3 Answers3

Linked