I am trying to obtain the COVID-19 data present in different worksheets of the following google sheet. The g-sheet being open for public usage, the URL only returns the first worksheet only.I want to scrape all the worksheets.Can any one help. Here's the google sheet link:
Asked
Active
Viewed 4,130 times
1 Answers
10
You can do it using requests. All the tables are in the source of one HTML document. Simply iterate through the tables and write to a CSV.
from bs4 import BeautifulSoup
import csv
import requests
html = requests.get('https://docs.google.com/spreadsheets/d/e/2PACX-1vSc_2y5N0I67wDU38DjDh35IZSIS30rQf7_NYZhtYYGU1jJYT6_kDx4YpF-qw0LSlGsBYP8pqM_a1Pd/pubhtml').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")
index = 0
for table in tables:
with open(str(index) + ".csv", "w") as f:
wr = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
wr.writerows([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])
index = index + 1

Dan-Dev
- 8,957
- 3
- 38
- 55
-
This was NEAT!! I have a couple of suggestions, though, one is to have a `lineterminator='\n'` parameter passed in `csv.writer()` method and secondly, an `encoding='utf8'` in the `open()` method. – tidakdiinginkan Apr 11 '20 at 06:58
-
Thank you so much dan! – Aniket Gaikwad Apr 15 '20 at 15:24
-
1very 牛逼,get sheet without google token! – Johnny Feb 25 '22 at 10:46