0

I would like to do the following.

1) Goto https://www.wunderground.com/history

2) Submit the form with the following values:

Location = 'Los Angeles, CA'

Month = 'November'

Day = '02'

Year = '2017'

3) Retrieve the response and save to a file

Here is what I have at the moment, but does not work:

import requests
url = 'https://www.wunderground.com/history'
payload = {'code': 'Los Angeles, CA', 'month': 'November', 'day': '2', 'year':'2017'}
r = requests.post(url, params=payload)

with open('test.html', 'w') as f:
    f.write(r.text)

I'm not getting the expected response and not sure If I am using requests properly or not.

I know there's an API from wunderground but prefer not to use it at the moment.

The content of test.html is basically the original page with no historical data.

I'm expecting this page:

https://www.wunderground.com/history/airport/KCQT/2017/11/2/DailyHistory.html?req_city=Los+Angeles&req_state=CA&req_statename=California&reqdb.zip=90012&reqdb.magic=1&reqdb.wmo=99999

Le Vu
  • 3
  • 2

2 Answers2

1

Try requests.get instead of requests.post as it seems that parameters are expected to be provided with GET.

import requests

url = 'https://www.wunderground.com/history'
payload = {'code': 'Los Angeles, CA', 'month': 'November', 'day': '2', 'year':'2017'}

r_get = requests.get(url, params=payload)
r_post = requests.post(url, params=payload)

assert r_get != r_post

More info on GET vs POST: https://www.w3schools.com/tags/ref_httpmethods.asp

Guybrush
  • 2,680
  • 1
  • 10
  • 17
1

You cannot blindly send payloads to some websites and expect a good result. First, look at the source code of the form element. I removed some unimportant parts:

<form action="/cgi-bin/findweather/getForecast" method="get" id="trip">
    <input type="hidden" value="query" name="airportorwmo" />
    <input type="hidden" value="DailyHistory" name="historytype" />
    <input type="hidden" value="/history/index.html" name="backurl" />
    <input type="text" value="" name="code" id="histSearch" />

    <select class="month" name="month">
        <option  value="1">January</option>
        <option  value="2">February</option>
        ...
        <option  value="12">December</option>
    </select>

    <select class="day" name="day">
        <option>1</option>
        <option>2</option>
        ...
        <option>31</option>
    </select>

    <select class="year" name="year">
        <option>2018</option>
        <option>2017</option>
        ...
        <option>1945</option>
    </select>

    <input type="submit" value="Submit" class="button radius" />
</form>

First, from the method attribute of the form element you can see that you have to use the GET method, not POST, to send the payload. Second, from the action attribute you can also see that you should send that payload to this specific URL:

https://www.wunderground.com/cgi-bin/findweather/getForecast

The payload itself are not just values you want to send. In many cases there are additional values that has to be sent in order for the web server to respond correctly. It is usually best to either send everything (basically every name attribute) or inspect what the website actually sends.

This code works for me:

import requests

URL = 'https://www.wunderground.com/cgi-bin/findweather/getForecast'
CODE = 'Los Angeles, CA'
DAY = 2
MONTH = 11
YEAR = 2017

params = {
    'airportorwmo': 'query',
    'historytype':  'DailyHistory',
    'backurl':      '/history/index.html',
    'code':         CODE,
    'day':          DAY,
    'month':        MONTH,
    'year':         YEAR,
    }

r = requests.get(URL, params=params)
print(r.text)
Jeyekomon
  • 2,878
  • 2
  • 27
  • 37
  • 1
    Thank you so much! I was doing a GET to to '/cgi-bin/findweather/getForecast' url but I neglected to include the hidden forms. That was the missing piece. – Le Vu Apr 09 '18 at 17:06