Okay, I'm at wit's end here. For my class, we are supposed to scrape data from the wunderground.com website. We keep running into issues (error messages), OR the code will run ok, but the .txt file will contain NO data. It's pretty annoying, because I need to do this! so here is my code.
f = open('wunder-data1.txt', 'w')
for m in range(1, 13):
for d in range(1, 32):
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True)
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2:
dStamp = '0' +str(d)
else:
dStamp = str(d)
timestamp = '2009' + mStamp +dStamp
f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
f.close()
Also sorry, this code is probably not the correct indentations as it is in Python. I'm not any good at this.
UPDATE: So someone answered the question below, and it worked, but I realized I was pulling the wrong data (oops). So I put in this:
import codecs
import urllib2
from bs4 import BeautifulSoup
f = codecs.open('wunder-data2.txt', 'w', 'utf-8')
for m in range(1, 13):
for d in range(1, 32):
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
dayTemp = soup.findAll(attrs={"class":"wx-value"})[5].span.string
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2:
dStamp = '0' +str(d)
else:
dStamp = str(d)
timestamp = '2009' + mStamp +dStamp
f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
f.close()
So I'm pretty unsure. What I'm trying to do is data scrape the