2

Primer: I'm extremely new to Python.

I am working on taking some Google Sheets data, and creating .xlsx sheets with it. With the code below, I am able to get Python to read the data and put it into arrays. However, I can't seem to figure out how to get openpyxl to write it to a document successfully. I'm assuming it has something to do with trying to write an array rather than iterating over rows that appear.

Any help/advice you all could provide would be greatly appreciated.

I get the following error when trying to run it:

Traceback (most recent call last):
  File "quickstart.py", line 94, in <module>
    main()
  File "quickstart.py", line 90, in main
    ws.append([values])
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 763, in append
    cell = Cell(self, row=row_idx, col_idx=col_idx, value=content)
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 115, in __init__
    self.value = value
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 299, in value
    self._bind_value(value)
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 206, in _bind_value
    raise ValueError("Cannot convert {0!r} to Excel".format(value))
ValueError: Cannot convert [['Student Name', 'Gender', 'Class Level', 'Home State', 'Major', 'Extracurricular Activity'], ['Alexandra', 'Female', '4. Senior', 'CA', 'English', 'Drama Club'], ['Andrew', 'Male', '1. Freshman', 'SD', 'Math', 'Lacrosse'], ['Anna', 'Female', '1. Freshman', 'NC', 'English', 'Basketball'], ['Becky', 'Female', '2. Sophomore', 'SD', 'Art', 'Baseball'], ['Benjamin', 'Male', '4. Senior', 'WI', 'English', 'Basketball'], ['Carl', 'Male', '3. Junior', 'MD', 'Art', 'Debate'], ['Carrie', 'Female', '3. Junior', 'NE', 'English', 'Track & Field'], ['Dorothy', 'Female', '4. Senior', 'MD', 'Math', 'Lacrosse'], ['Dylan', 'Male', '1. Freshman', 'MA', 'Math', 'Baseball'], ['Edward', 'Male', '3. Junior', 'FL', 'English', 'Drama Club'], ['Ellen', 'Female', '1. Freshman', 'WI', 'Physics', 'Drama Club'], ['Fiona', 'Female', '1. Freshman', 'MA', 'Art', 'Debate'], ['John', 'Male', '3. Junior', 'CA', 'Physics', 'Basketball'], ['Jonathan', 'Male', '2. Sophomore', 'SC', 'Math', 'Debate'], ['Joseph', 'Male', '1. Freshman', 'AK', 'English', 'Drama Club'], ['Josephine', 'Female', '1. Freshman', 'NY', 'Math', 'Debate'], ['Karen', 'Female', '2. Sophomore', 'NH', 'English', 'Basketball'], ['Kevin', 'Male', '2. Sophomore', 'NE', 'Physics', 'Drama Club'], ['Lisa', 'Female', '3. Junior', 'SC', 'Art', 'Lacrosse'], ['Mary', 'Female', '2. Sophomore', 'AK',
'Physics', 'Track & Field'], ['Maureen', 'Female', '1. Freshman', 'CA', 'Physics', 'Basketball'], ['Nick', 'Male', '4. Senior', 'NY', 'Art', 'Baseball'], ['Olivia', 'Female', '4. Senior', 'NC', 'Physics', 'Track & Field'], ['Pamela', 'Female', '3. Junior', 'RI', 'Math', 'Baseball'], ['Patrick', 'Male', '1. Freshman', 'NY', 'Art', 'Lacrosse'], ['Robert', 'Male', '1. Freshman', 'CA', 'English', 'Track & Field'], ['Sean', 'Male', '1. Freshman', 'NH', 'Physics', 'Track & Field'], ['Stacy', 'Female', '1. Freshman', 'NY', 'Math', 'Baseball'], ['Thomas', 'Male', '2. Sophomore', 'RI', 'Art', 'Lacrosse'], ['Will', 'Male', '4. Senior', 'FL', 'Math', 'Debate']] to Excel

Here's the code I have do far:

from __future__ import print_function
import httplib2
import oauth2client
import os
import googleapiclient
import openpyxl

from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
from googleapiclient.discovery import build
from openpyxl import Workbook


""" This is the code to get raw data from a specific Google Sheet"""
try:
    import argparse
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
    flags = None

# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/sheets.googleapis.com-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret_noemail.json'
APPLICATION_NAME = 'Google Sheets API Python'


def get_credentials():
    """Gets valid user credentials from storage.

    If nothing has been stored, or if the stored credentials are invalid,
    the OAuth2 flow is completed to obtain the new credentials.

    Returns:
        Credentials, the obtained credential.
    """
    home_dir = os.path.expanduser('~')
    credential_dir = os.path.join(home_dir, '.credentials')
    if not os.path.exists(credential_dir):
        os.makedirs(credential_dir)
    credential_path = os.path.join(credential_dir,
                                   'sheets.googleapis.com-python-quickstart.json')

    store = Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
        flow.user_agent = APPLICATION_NAME
        if flags:
            credentials = tools.run_flow(flow, store, flags)
        else:  # Needed only for compatibility with Python 2.6
            credentials = tools.run_flow(flow, store)
        print('Storing credentials to ' + credential_path)
    return credentials


def main():
    """Shows basic usage of the Sheets API.

    Creates a Sheets API service object and prints the names and majors of
    students in a sample spreadsheet:
    https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit
    """
    credentials = get_credentials()
    http = credentials.authorize(httplib2.Http())
    discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?'
                    'version=v4')
    service = build('sheets', 'v4', http=http,
                              discoveryServiceUrl=discoveryUrl)

    spreadsheetId = '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms'
    rangeName = 'Class Data!A:F'
    result = service.spreadsheets().values().get(
        spreadsheetId=spreadsheetId, range=rangeName).execute()
    values = result.get('values', [])

    if not values:
        print('No data found.')
    else:
        wb = Workbook()
        ws = wb.active  
        # Add new row at bottom
        ws.append([values])
        wb.save("users.xlsx")  # Write to disk

if __name__ == '__main__':
    main()

Update: when trying to do ws.append(values), I get the following error:

Traceback (most recent call last):
  File "quickstart.py", line 94, in <module>
    main()
  File "quickstart.py", line 90, in main
    ws.append(values)
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 763, in append
    cell = Cell(self, row=row_idx, col_idx=col_idx, value=content)
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 115, in __init__
    self.value = value
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 299, in value
    self._bind_value(value)
  File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\cell\cell.py", line 206, in _bind_value
    raise ValueError("Cannot convert {0!r} to Excel".format(value))
ValueError: Cannot convert ['Student Name', 'Gender', 'Class Level', 'Home State', 'Major', 'Extracurricular Activity'] to Excel

Update with Pandas Code:

When leaving the index flag as true (unedited), this is the layout the data has:

"""
Using pandas and dataframes. For writing large amounts of data, this is probably the best way. 
"""
df=DataFrame(data=values)
df.to_excel('Pandas.xlsx')

Index Flag True

When marking the index flag as false, this is how the data looks:

"""
Using pandas and dataframes. For writing large amounts of data, this is probably the best way. 
"""
df=DataFrame(data=values)
df.to_excel('Pandas.xlsx', index=False)

Index Flag False

Update 2: Solved solution! The code below from Zyd and Ben.T completely solved my issues.

"""
Using pandas and dataframes. For writing large amounts of data, this is probably the best way. 
"""
df=DataFrame(data=values)
df.to_excel('Pandas.xlsx', header=False, index=False)

No Indexes

Thank you all for the help!

swolfe2
  • 429
  • 1
  • 6
  • 24
  • Maybe try just 'ws.append(values) instead of putting values in a list? You could also just build a pandas DataFrame with your records from the google spreadsheet and use the DataFrame.to_excel() method – zyd Mar 05 '18 at 19:15
  • Thanks for the reply, @zyd. I'm getting a different error message string now, and I've updated the OP at the bottom. – swolfe2 Mar 05 '18 at 19:19
  • Are you adding to an existing worksheet or creating a new workbook? I'm a bit confused because your comment says 'add new row at bottom' but it looks like you're creating a new workbook. – zyd Mar 05 '18 at 20:20
  • I'm trying to just create a new workbook with the data retrieved. I found that snippet from another website using openpyxl. – swolfe2 Mar 05 '18 at 20:24
  • Okay I'm going to add an answer in a little bit for you to try with pandas, gotta run to do something right now – zyd Mar 05 '18 at 20:33

3 Answers3

2

This may not be correct but my answer is a bit too much to put in a comment.

Where you're doing:

if not values:
    print('No data found.')
else:
    wb = Workbook()
    ws = wb.active  
    # Add new row at bottom
    ws.append([values])
    wb.save("users.xlsx")  # Write to disk

I would try

import pandas
if not values:
    print('No data found.')
else:
    dataframe = pandas.DataFrame([values])
    dataframe.to_excel('excel_out.xlsx')
zyd
  • 833
  • 7
  • 16
  • Thanks! Only one small tweak: it was printing the entire array on a single row. `if not values: print('No data found.') else: df = DataFrame(data=values) writer = pd.ExcelWriter('Data.xlsx') df.to_excel(writer, 'Raw Data') writer.save()` I tried making the index property false, which hides the row index. Is there a way to do the column index, too? Thanks so much! – swolfe2 Mar 05 '18 at 20:57
  • so you tried df.to_excel(writer, 'Raw Data', index=False)? I'm not sure I fully understand your question. Would it be possible for you to share a small sample of your data to better interpret how to remove the column index. It sounds like you might have a multi-indexed header? I think you should also be able to skip the pd.ExcelWriter and writer.save()step and just pass the output path to df.to_excel: df = DataFrame(data=values) df.to_excel('Data.xlsx', index=False) – zyd Mar 05 '18 at 22:36
  • 1
    Hey Zyd, sorry for the delay. I've updated the OP with images of what the data looks like when it finishes processing. The link to the Google Sheets data is [here](https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit#gid=0). Thanks again for looking at it, I really appreciate all the help. – swolfe2 Mar 06 '18 at 13:06
  • 1
    At the bottom of this thread, Ben.T showed that headers=False will eliminate that row. Thanks! – swolfe2 Mar 06 '18 at 13:12
  • @swolfe2 Looks like Ben T. was able to help you out with the headers=False. Glad we could all help! pandas is a great tool for working with structured csv / delimited data, I'd highly recommend reading up on it when you have time. It's basically my python data workhorse. Happy coding! – zyd Mar 06 '18 at 17:28
2

It seems that the variable called values is a list containing a list for each row of your google spreadsheet. I tried:

import openpyxl
# to recreate a list of list
list_input = [['one','two'],['second row','with',14]]

wb = openpyxl.Workbook()
ws = wb.active
ws.append(list_input)
wb.save("test.xlsx")

and I got the same error than your update.

By replacing the line ws.append(list_input) by

for row in list_input:
    ws.append(row)

My code worked, so I would replace your line ws.append([values]) by

for row in values:
    ws.append(row)

Hope it is some help for you.

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • Hey Ben, thanks for the reply. I attempted what you suggested, but it only wrote a single row to Excel file. Here's my current code: `values = result.get('values', []) if not values: print('No data found.') else: for row in values: # to recreate a list of list wb = openpyxl.Workbook() ws = wb.active ws.append(row) wb.save("test.xlsx")` Any ideas? Thanks again! – swolfe2 Mar 05 '18 at 21:12
  • If I understand well, the only row in you excel file should be the last one of your google spreadsheet. The two lines: `wb = openpyxl.Workbook()` and `ws = wb.active` should not be in your loop for, just the line `ws.append(row)`. Otherwise for every row, you recreate a new workbook, deleting your previous entry. – Ben.T Mar 05 '18 at 21:26
  • 1
    Got it! That makes sense. Changing to this worked great: `if not values: print('No data found.') else: wb = openpyxl.Workbook() ws = wb.active for row in values: # to recreate a list of list ws.append(row) wb.save("test.xlsx")` Thanks for the help! – swolfe2 Mar 05 '18 at 21:30
  • 1
    Welcome. For a large data frame, using pandas and the solution of @zyd is probably faster than a loop `for` on each row. – Ben.T Mar 05 '18 at 21:43
  • 1
    and to answer to your question from the comment under zyd answer (sorry I can add a comment yet under his answer), according to [link](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html) you can try `df.to_excel(writer, 'Raw Data', header = False, index = False)` instead of `df.to_excel(writer, 'Raw Data')` in your code. Let me know if it does what you want. – Ben.T Mar 05 '18 at 22:10
  • Hey @Ben, you're exactly right. The header = False AND index = False solved it. Thanks for the time and the help! : ) – swolfe2 Mar 06 '18 at 13:08
0

There's a simpler way to use the googlesheet API to convert the gsheet to excel. You can make use of the service field to call the function as below.

def convert_to_excel(file_id: str, excel_file_path: str):
   byteData = service.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet').execute()
   with open(excel_file_path,'wb') as f:
      f.write(byteData)
      f.close()

This method makes use of vendor specific mime types - (https://en.wikipedia.org/wiki/Media_type), (What is the meaning of "vnd" in MIME types?). Parameters are : file ID(google sheet id), excel file path.