writing and saving CSV file from scraping data using python and Beautifulsoup4

Question

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer

I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need.

Attached below is my script. I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop.

Here is my script below:

import csv
import requests 
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})


for item in g_data1:
     try:
          print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text
     except:
          pass  
     try:
          print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text
     except:
          pass

for item in g_data2:
   try:
      print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      pass

This is what I currently get when I run the script. I want to take this data and make into a CSV table for geocoding later.

1801 Merrimac Trl
Williamsburg, Virginia 23185-5905

12551 Glades Rd
Boca Raton, Florida 33498-6830
Preserve Golf Club 
13601 SW 115th Ave
Dunnellon, Florida 34432-5621
1000 Acres Ranch Resort 
465 Warrensburg Rd
Stony Creek, New York 12878-1613
1757 Golf Club 
45120 Waxpool Rd
Dulles, Virginia 20166-6923
27 Pines Golf Course 
5611 Silverdale Rd
Sturgeon Bay, Wisconsin 54235-8308
3 Creek Ranch Golf Club 
2625 S Park Loop Rd
Jackson, Wyoming 83001-9473
3 Lakes Golf Course 
6700 Saltsburg Rd
Pittsburgh, Pennsylvania 15235-2130
3 Par At Four Points 
8110 Aero Dr
San Diego, California 92123-1715
3 Parks Fairways 
3841 N Florence Blvd
Florence, Arizona 85132
3-30 Golf & Country Club 
101 Country Club Lane
Lowden, Iowa 52255
401 Par Golf 
5715 Fayetteville Rd
Raleigh, North Carolina 27603-4525
93 Golf Ranch 
406 E 200 S
Jerome, Idaho 83338-6731
A 1 Golf Center 
1805 East Highway 30
Rockwall, Texas 75087
A H Blank Municipal Course 
808 County Line Rd
Des Moines, Iowa 50320-6706
A-Bar-A Ranch Golf Course 
Highway 230
Encampment, Wyoming 82325
A-Ga-Ming Golf Resort, Sundance 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A-Ga-Ming Golf Resort, Torch 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A. C. Read Golf Club, Bayou 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
A. C. Read Golf Club, Bayview 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508

What's the difference between g_data1 and g_data2? I can't seem to find where they change in the output. — evamvid, Jun 25 '15 at 20:49
This would be the difference between views-field-nothing-1 and views-field-nothing — evamvid, Jun 25 '15 at 20:50
From what I can tell, the views-field-nothing-1 div includes the pictures... — evamvid, Jun 25 '15 at 20:51

score 6 · Accepted Answer · answered Jun 25 '15 at 20:59

6

All you really need to do here is put your output in a list and then use the CSV library to export it. I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like:

courses_list=[]

for item in g_data2:
   try:
      name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
       name=''
   try:
      address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      address1=''
   try:
      address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      address2=''

   course=[name,address1,address2]
   courses_list.append(course)

This will put the courses in a list, next you can write them to a cvs like so:

import csv

with open ('filename.cv','wb') as file:
   writer=csv.writer(file)
   for row in course_list:
      writer.writerow(row)

answered Jun 25 '15 at 20:59

AustinC

826
1
8
23

1

Thank you for your help! so I used views-field-nothing-1 to generate ownership and tell if it is private or public. How do I incorporate that with my given script and what if I want my code to do the other pages with data since the list goes to around 20 pages how do I scrape date from the other pages? Lastly How do i save the CSV file to my local drive on a Mac? – Gonzalo68 Jun 25 '15 at 22:16
NVM I got how it was saved is it possible to specify a folder? How do I make my script loop for other parts of the website for the other data? How do I create headers for my cvs file! Thank you so much this so helpful! – Gonzalo68 Jun 25 '15 at 22:22
You might want to read a tutorial on Python lists. A header row is just another list you're going to push to your master list. So before your loop that pushes courses, you could just do: courses_list.append([name,address1,address2]) – AustinC Jun 26 '15 at 00:06
I can't really speak to other parts of the website - I'm guessing that what you're going to want to do is create a master for loop that goes through pages. So let's say that every page is www.pga.com/golf-courses/x.html where x is that search string - you'll have to figure out how to alter that search string to give you all the various pages you want. Generate a big list of parameters like maybe zip_codes=[20002,20770,77803,...] and then loop through them and for each one something like: for zip in zip_codes: url=base_url+zip your code – AustinC Jun 26 '15 at 00:09
But these are big questions! I suggest looking at a few python tutorials to get comfortable with some of these basic manipulations involving lists and other data types like dicts. – AustinC Jun 26 '15 at 00:10

score 0 · Answer 2 · answered Jun 25 '15 at 20:54

First of all you want to put all of your items in a list and then write to a file later in case there is an error while you are scrapping. Instead of printing just append to a list. Then you can write to a csv file

f= open('filename', 'wb')
csv_writer = csv.writer(f)
for i in main_list:
    csv_writer.writerow(i)
f.close()

writing and saving CSV file from scraping data using python and Beautifulsoup4

2 Answers2

Linked