0

Okay, I'm at wit's end here. For my class, we are supposed to scrape data from the wunderground.com website. We keep running into issues (error messages), OR the code will run ok, but the .txt file will contain NO data. It's pretty annoying, because I need to do this! so here is my code.

f = open('wunder-data1.txt', 'w')
for m in range(1, 13):
for d in range(1, 32):
    if (m == 2 and d > 28):
        break
    elif (m in [4, 6, 9, 11] and d > 30):
        break
    url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
    page = urllib2.urlopen(url)
    soup = BeautifulSoup(page, "html.parser")
    dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True)
    if len(str(m)) < 2:
        mStamp = '0' + str(m)
    else:
        mStamp = str(m)
    if len(str(d)) < 2:
        dStamp = '0' +str(d)
    else:
        dStamp = str(d)
    timestamp = '2009' + mStamp +dStamp
    f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
    f.close()

Also sorry, this code is probably not the correct indentations as it is in Python. I'm not any good at this.

UPDATE: So someone answered the question below, and it worked, but I realized I was pulling the wrong data (oops). So I put in this:

    import codecs
    import urllib2
    from bs4 import BeautifulSoup

    f = codecs.open('wunder-data2.txt', 'w', 'utf-8')

    for m in range(1, 13):
        for d in range(1, 32):
            if (m == 2 and d > 28):
                break
            elif (m in [4, 6, 9, 11] and d > 30):
                break

            url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
            page = urllib2.urlopen(url)
            soup = BeautifulSoup(page, "html.parser")

            dayTemp = soup.findAll(attrs={"class":"wx-value"})[5].span.string
            if len(str(m)) < 2:
                mStamp = '0' + str(m)
            else:
                mStamp = str(m)
            if len(str(d)) < 2:
                dStamp = '0' +str(d)
            else:
                dStamp = str(d)

            timestamp = '2009' + mStamp +dStamp

            f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')

    f.close()

So I'm pretty unsure. What I'm trying to do is data scrape the

  • 2
    Please [edit] your post to fix your indentation so the posted code actually runs. Additionally, please add the **full text** of any errors or tracebacks. – MattDMo Jan 15 '17 at 01:24
  • Explain which months and days you want to get data. Also Instead of 2 for loops create a list of urls and process them one at a time, just a suggestion. your code is quite messy... – firephil Jan 15 '17 at 01:28
  • There aren't any errors, it just won't put anything into a .txt file. Also, I'm so sorry. I really have no clue what I'm doing. This is all for a class. – Sierra Thomander Jan 15 '17 at 01:41
  • 1
    When you post a question try to provide the specification of what is asked of you to achieve it is very annoying to try to guess and you will be downvoted 99% of the time i neutralized your vote count because you are new but try and make an effort i.e What I'm trying to do is data scrape the ? what ? dont be lazy – firephil Jan 15 '17 at 02:20

1 Answers1

0

I encountered the following errors (and fixed them below) when trying to execute your code:

  1. Indentation of the nested loops was invalid.
  2. Missing imports (the lines at the top), but maybe you just excluded them from your paste.
  3. Trying to write "utf-8" encoded strings to an "ascii" file. To fix this I used the codecs module to open the file f as "utf-8".
  4. The file was closed inside the loop, meaning that after writing to it the first time, it'd be closed and then the next write would fail (because it was closed). I moved the line to close the file to the outside of the loops.

Now as far as I can tell (without you telling us what you actually want this code to do), it's working? At least no errors are immediately popping up...

import codecs
import urllib2
from bs4 import BeautifulSoup

f = codecs.open('wunder-data1.txt', 'w', 'utf-8')

for m in range(1, 13):
    for d in range(1, 32):
        if (m == 2 and d > 28):
            break
        elif (m in [4, 6, 9, 11] and d > 30):
            break

        url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
        page = urllib2.urlopen(url)
        soup = BeautifulSoup(page, "html.parser")

        dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True)

        if len(str(m)) < 2:
            mStamp = '0' + str(m)
        else:
            mStamp = str(m)
        if len(str(d)) < 2:
            dStamp = '0' +str(d)
        else:
            dStamp = str(d)

        timestamp = '2009' + mStamp +dStamp

        f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')

f.close()

As the comments on your question have suggested, there are other areas for improvement here which I have not touched on - I've simply tried to get the code you posted executing.

Bilal Akil
  • 4,716
  • 5
  • 32
  • 52
  • Okay, thus far your code is working Bilal Akil, so thank you! Sorry I'm so incompetent. I've never used Python before, and there were no pre-req. for the class for it, but I don't think our teacher realized how hard it would be. I really appreciate your help! – Sierra Thomander Jan 15 '17 at 01:48
  • `import codecs` is necessary to solve the third problem I mentioned. I used the imported `codecs` module 4 lines later to change how you opened the file: `codecs.open('wunder-data.txt', 'w', 'utf-8')`. It's opened the same file as you had before, but this time in the UTF-8 encoding. – Bilal Akil Jan 15 '17 at 01:56
  • Thank you so so much! With some tweaking, I got it all to work! Thank you so much! You've saved my grade. Not all heroes wear capes. – Sierra Thomander Jan 15 '17 at 02:12
  • 2
    We're happy to help, that's why we're here :) All we ask is that you do your best when creating questions, by trying to make your question clear and by providing information that'll help us understand what you're trying to do and why you're having trouble - that'll help us help you. Now if you think your question has been answered, then you can use the Answered button to mark it as such :) All the best! – Bilal Akil Jan 15 '17 at 02:19