6

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer

I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need.

Attached below is my script. I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop.

Here is my script below:

import csv
import requests 
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})


for item in g_data1:
     try:
          print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text
     except:
          pass  
     try:
          print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text
     except:
          pass

for item in g_data2:
   try:
      print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      pass

This is what I currently get when I run the script. I want to take this data and make into a CSV table for geocoding later.

1801 Merrimac Trl
Williamsburg, Virginia 23185-5905

12551 Glades Rd
Boca Raton, Florida 33498-6830
Preserve Golf Club 
13601 SW 115th Ave
Dunnellon, Florida 34432-5621
1000 Acres Ranch Resort 
465 Warrensburg Rd
Stony Creek, New York 12878-1613
1757 Golf Club 
45120 Waxpool Rd
Dulles, Virginia 20166-6923
27 Pines Golf Course 
5611 Silverdale Rd
Sturgeon Bay, Wisconsin 54235-8308
3 Creek Ranch Golf Club 
2625 S Park Loop Rd
Jackson, Wyoming 83001-9473
3 Lakes Golf Course 
6700 Saltsburg Rd
Pittsburgh, Pennsylvania 15235-2130
3 Par At Four Points 
8110 Aero Dr
San Diego, California 92123-1715
3 Parks Fairways 
3841 N Florence Blvd
Florence, Arizona 85132
3-30 Golf & Country Club 
101 Country Club Lane
Lowden, Iowa 52255
401 Par Golf 
5715 Fayetteville Rd
Raleigh, North Carolina 27603-4525
93 Golf Ranch 
406 E 200 S
Jerome, Idaho 83338-6731
A 1 Golf Center 
1805 East Highway 30
Rockwall, Texas 75087
A H Blank Municipal Course 
808 County Line Rd
Des Moines, Iowa 50320-6706
A-Bar-A Ranch Golf Course 
Highway 230
Encampment, Wyoming 82325
A-Ga-Ming Golf Resort, Sundance 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A-Ga-Ming Golf Resort, Torch 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A. C. Read Golf Club, Bayou 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
A. C. Read Golf Club, Bayview 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
Tarik
  • 10,810
  • 2
  • 26
  • 40
Gonzalo68
  • 431
  • 4
  • 11
  • 22

2 Answers2

6

All you really need to do here is put your output in a list and then use the CSV library to export it. I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like:

courses_list=[]

for item in g_data2:
   try:
      name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
       name=''
   try:
      address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      address1=''
   try:
      address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      address2=''

   course=[name,address1,address2]
   courses_list.append(course)

This will put the courses in a list, next you can write them to a cvs like so:

import csv

with open ('filename.cv','wb') as file:
   writer=csv.writer(file)
   for row in course_list:
      writer.writerow(row)
AustinC
  • 826
  • 1
  • 8
  • 23
  • 1
    Thank you for your help! so I used views-field-nothing-1 to generate ownership and tell if it is private or public. How do I incorporate that with my given script and what if I want my code to do the other pages with data since the list goes to around 20 pages how do I scrape date from the other pages? Lastly How do i save the CSV file to my local drive on a Mac? – Gonzalo68 Jun 25 '15 at 22:16
  • NVM I got how it was saved is it possible to specify a folder? How do I make my script loop for other parts of the website for the other data? How do I create headers for my cvs file! Thank you so much this so helpful! – Gonzalo68 Jun 25 '15 at 22:22
  • You might want to read a tutorial on Python lists. A header row is just another list you're going to push to your master list. So before your loop that pushes courses, you could just do: courses_list.append([name,address1,address2]) – AustinC Jun 26 '15 at 00:06
  • I can't really speak to other parts of the website - I'm guessing that what you're going to want to do is create a master for loop that goes through pages. So let's say that every page is www.pga.com/golf-courses/x.html where x is that search string - you'll have to figure out how to alter that search string to give you all the various pages you want. Generate a big list of parameters like maybe zip_codes=[20002,20770,77803,...] and then loop through them and for each one something like: for zip in zip_codes: url=base_url+zip your code – AustinC Jun 26 '15 at 00:09
  • But these are big questions! I suggest looking at a few python tutorials to get comfortable with some of these basic manipulations involving lists and other data types like dicts. – AustinC Jun 26 '15 at 00:10
0

First of all you want to put all of your items in a list and then write to a file later in case there is an error while you are scrapping. Instead of printing just append to a list. Then you can write to a csv file

f= open('filename', 'wb')
csv_writer = csv.writer(f)
for i in main_list:
    csv_writer.writerow(i)
f.close()
user2438604
  • 91
  • 1
  • 3
  • 8