Download google docs public spreadsheet to csv with python

Question

I can download a CSV file from Google Docs with wget:

wget --no-check-certificate --output-document=locations.csv 'https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv'

But I can't download the same csv with Python:

import urllib2

request = urllib2.Request('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv')
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13')
opener = urllib2.build_opener()
data = opener.open(request).read()
print(data)

The result is the Google login page. What am I doing wrong?

score 30 · Accepted Answer · edited Oct 08 '19 at 00:56

30

Just use requests, it is way better than using urllib:

import requests
response = requests.get('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv')
assert response.status_code == 200, 'Wrong status code'
print(response.content)

You can install it with

pip install requests

edited Oct 08 '19 at 00:56

Boris Verkhovskiy

14,854
11
100
103

answered Oct 11 '12 at 15:01

Jayson Reis

708
5
14

5

nitpick: You should use `response.raise_for_status()` instead of asserting `response.status_code == 200` for the simple reason that `assert` statements are stripped out when python is run with optimization flags. – Don Spaulding Apr 09 '13 at 15:21
1

That was just for explanation, if you deploy code with python optimize, all asserts will be stripped. – Jayson Reis Sep 28 '13 at 11:38
Hey @JaysonReis, is there an option to choose which sheet your need. And also when you do specify only the `&output=csv' are you getting only one or all of them? – Edmond Jun 08 '20 at 15:10
Did not work for me. I had to use `'https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc/export?format=csv'` – Eduardo Pignatelli Nov 11 '20 at 18:45
1

These URLs likely changed over the years, it is better if you checkout google doc's API and maybe use some library that already implements those functionalities. – Jayson Reis Nov 12 '20 at 20:55

score 12 · Answer 2 · answered Apr 10 '13 at 20:10

You're not storing cookies.

First let me say that I completely endorse the recommendation to use the most-excellent requests library.

However, if you need to do this in vanilla Python 2, the problem lies in the fact that Google is bouncing you around via HTTP 302 redirects, and it expects you to remember the cookies it's setting with each response. When it detects that you aren't storing cookies, it redirects you to the login page.

By default, urllib2.urlopen (or the opener returned from build_opener) will follow 302 redirects, but it won't store HTTP cookies. You have to teach your opener how to do that. Like so:

>>> from cookielib import CookieJar
>>> from urllib2 import build_opener, HTTPCookieProcessor
>>> opener = build_opener(HTTPCookieProcessor(CookieJar()))
>>> resp = opener.open('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv')
>>> data = resp.read()

Again, use requests if at all possible, but if it's not possible, the standard library can get the job done.

+1 for providing plain-vanilla python 2 solution. Not everyone necessarily has the ability or authorization to just install new libraries. — dreftymac, Sep 04 '13 at 02:34

score 3 · Answer 3 · answered Aug 07 '21 at 16:18

Doesn't get any simpler than using Pandas:

def build_sheet_url(doc_id, sheet_id):
    return f'https://docs.google.com/spreadsheets/d/{doc_id}/export?format=csv&gid={sheet_id}'

def write_df_to_local(df, file_path):
    df.to_csv(file_path)

doc_id = 'DOC_ID'
sheet_id = 'SHEET_ID'
sheet_url = build_sheet_url(doc_id, sheet_id)
df = pd.read_csv(sheet_url)
file_path = 'FILE_PATH'
write_df_to_local(df, file_path)

wescpy · Answer 4 · 2020-10-20T21:57:59.790

The requests library is great and the gold standard for HTTP requests from Python, however this style of download is, while not deprecated yet, not likely to last, specifically referring to the download-link style. In fact, the downloadUrl field in the Google Drive API v2 is already deprecated. The currently accepted way of exporting Google Sheets as CSV is by using the (current) Google Drive API.

So why the Drive API? Isn't this supposed to be something for the Sheets API instead? Well, the Sheets API is for spreadsheet-oriented functionality, i.e., data formatting, column resize, creating charts, cell validation, etc., while the Drive API is for file-oriented functionality, i.e., import/export.

Below is a complete cmd-line solution. (If you don't do Python, you can use it as pseudocode and pick any language supported by the Google APIs Client Libraries.) For the code snippet, assume the most current Sheet named inventory (older files with that name are ignored) and DRIVE is the API service endpoint:

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

# query for latest file named FILENAME
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

# if found, export 1st matching Sheets file as CSV
if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()

    # if non-empty file
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')

If your Sheet is large, you may have to export it in chunks -- see this page on how to do that. If you're generally new to Google APIs, I have a (somewhat dated but) user-friendly intro video for you. (There are 2 videos after that maybe useful too.)

score 0 · Answer 5 · answered Oct 11 '12 at 15:00

0

i would use requests

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv')
data = r.content

answered Oct 11 '12 at 15:00

jassinm

7,323
3
33
42

score 0 · Answer 6 · answered Apr 15 '21 at 04:17

must need requests lib --> pip install requests

from urllib.parse import urlparse
import requests

link = "https://docs.google.com/spreadsheets/d/11D0KAvm_ERXZ3XMgft5DM19IREaNvGargPlvW8e2DXg/edit#gid=0"
domain = urlparse(link).netloc
segments = link.rpartition('/')
link = segments[0] + "/export?format=csv"
file = requests.get(link)
if file.status_code == 200:
    fileContent = file.content.decode('utf-8')
    print(fileContent)

Download google docs public spreadsheet to csv with python

6 Answers6

Linked