Beautifulsoup/Writer returns an empty cell when exported to a CSV

Question

I'm scraping a website to get the name, birth + death dates, and the name of the cemetery someone is buried in. For the most part, it is working quite well; however, when I exported the text to a CSV, I noticed that there's a blank cell inserted in the name column after each page. I have a feeling this is probably related to the loop rather than an html tag, but I'm still learning. Any advice is welcome! Thanks everyone

Here's an example of the problem in excel

from dataclasses import replace
import requests
from bs4 import BeautifulSoup
import csv 

api = 'https://www.findagrave.com/memorial/search?'
name = 'firstname=&middlename=&lastname='
years = 'birthyear=&birthyearfilter=&deathyear=&deathyearfilter='
place = 'location=Yulee%2C+Nassau+County%2C+Florida%2C+United+States+of+America&locationId=city_28711'
memorialid = 'memorialid=&mcid='
linkname = 'linkedToName='
daterange = 'datefilter='
plotnum = 'orderby=r&plot='
page = 'page='
url = api + name + "&" + years + "&" + place + "&" + memorialid + "&" + linkname + "&" + daterange + "&" + plotnum + '&' + page

for page_no in range(1,93): 
   url_final = url + str(page_no)
   page = requests.get(url_final, headers = headers)

   #print(page)
   soup = BeautifulSoup(page.content, "html.parser")
   graves = soup.find_all('div', {'class':'memorial-item py-1'})
   #print(graves)

   

   #Getting the Names 
   grave_name = soup.find_all('h2', {'class':'name-grave'}) 

   #Dates
   dates = soup.find_all('b', {'class':'birthDeathDates'})

   #Graveyard Name
   grave_yard = soup.find_all('button', {'role': 'link'})
   #print(grave_yard)

   dataset = [(x.text, y.text, z.text) for x,y,z in zip(grave_name, dates, grave_yard)]
   with open('Fernandiabeach3.csv', 'a',) as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(dataset)

I've tried to see if there were any similar tags happening at the beginning of each new page, but I couldn't find anything that stood out.

score 0 · Accepted Answer · answered Dec 05 '22 at 15:57

0

Change

with open('Fernandiabeach3.csv', 'a',) as csvfile:

to

with open('Fernandiabeach3.csv', 'a', newline='') as csvfile:

In simple terms:

since you haven't defined how the new line should be it was adding rows

In technical terms:

The csv.writer module directly controls line endings and writes \r\n into the file directly. In Python 3 the file must be opened in untranslated text mode with the parameters 'w', newline='' (empty string) or it will write \r\r\n on Windows, where the default text mode will translate each \n into \r\n.

Hope this helps. Happy Coding :)

answered Dec 05 '22 at 15:57

Amysoj-Louis

635
1
1
8

Hi louis joseph, thank you so much for this! This definitely saves me a step with cleaning the CSV (I was pulling it into Stata and deleting every odd row). There's still this one problem with the CSV - it still includes a blank in the name field right before a new record on the next page. I'm adding a screenshot to the post now! PS: I've definitely been enjoying learning python through this project - it's definitely very different than Stata programming - my bread and butter program. – Sanderson10453 Dec 05 '22 at 16:16
your code is not scrapping any data so the field is being returned as blank – Amysoj-Louis Dec 05 '22 at 16:34
while looking around the site for how the data is received in the site I found that it has an API URL that returns a JSON format output which you can change to CSV. – Amysoj-Louis Dec 05 '22 at 16:36
https://www.findagrave.com/memorial/search?ajax=true&skip=40&limit=20&firstname=&middlename=&lastname=&birthyear=&birthyearfilter=&deathyear=&deathyearfilter=&locationId=city_28711&location=Yulee%2C%20Nassau%20County%2C%20Florida%2C%20United%20States%20of%20America&memorialid=&datefilter=&orderby=r&photofilter=&gpsfilter=&famous=&cenotaph=&flowers=&sponsored=&noCemetery=&includeNickName=&includeMaidenName=&includeTitles=&hasPlot=&plot=&partialLastName=&exactName=&linkedToName=&fuzzyNames=&mcid= – Amysoj-Louis Dec 05 '22 at 16:36
Thank you louis! I'm embarassed I didn't know this API existed. You have saved me so much time and helped me with writer syntax. I really appreciate it! – Sanderson10453 Dec 05 '22 at 16:55
happy to help. I just found out the API provides 100 sub-data at a time max. Hope this helps. Happy coding :) – Amysoj-Louis Dec 05 '22 at 17:04

Beautifulsoup/Writer returns an empty cell when exported to a CSV

1 Answers1