So, I am doing a course on Python3 and in the scraping section, we have an assignment to scrape the http://quotes.toscrape.com/ website and get the text, author, and link of author's bio for all of the quotes, including the ones on "next" pages. I have done this but after I go to every new page, I get one line of empty row that I initially envisioned as headers.
import requests
from bs4 import BeautifulSoup
import csv
from time import sleep
base_url = "http://quotes.toscrape.com"
url = "/page/1"
f = open("scraping_project_final.csv", "w")
f.truncate()
f.close()
while url:
with open("scraping_project_final.csv", "a") as file:
csv_writer = csv.writer(file)
csv_writer.writerow(["text", "name", "url"])
response = requests.get(f"{base_url}{url}")
print(f"Scraping {base_url}{url}")
soup = BeautifulSoup(response.text, "html.parser")
quotes = soup.find_all(class_="quote")
for quote in quotes:
txt = quote.find(class_="text").get_text()
author = quote.find(class_="author").get_text()
link = quote.find("a")["href"]
csv_writer.writerow([txt, author, link])
next_page = soup.find(class_="next")
url = next_page.find("a")["href"] if next_page else None
# sleep(2)
So, the issue that I have is that the initial writerrow actually creates one empty row each iteration, how do I avoid this? I would like to continue on this approach and not use DictReader if possible. I have added an image below, that is the CSV output. You can see that after ten rows, there is a row with just: text, names, url.