Could anyone help me revise this Python program to correctly submit information to the "Date Range" query, and then extract the "Close" return data. I am scraping data from the following url:
http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices
And this is my current code, which returns "[ ]".
from lxml import html
import requests
def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % (symbol)
form_data = {
'a': stMonth, #00 is January, 01 is Feb., etc.
'b': stDate,
'c': stYear,
'd': enMonth, #00 is January, 01 is Feb., etc.
'e': enDate,
'f': enYear,
'submit': 'submit',
}
response = requests.post(url, data=form_data)
tree = html.document_fromstring(response.content)
p = tree.xpath('//*[@id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]/tbody/tr/td/table/tbody/tr[2]/td[7]/text()')
print p
historic_quotes('baba',00,11,2010,00,11,2012)
I am an overall Python novice, and greatly appreciate any and all help. Thanks for reading!
Also, I realize now the html source may be of help, but it is huge - so here's an XPATH to it:
//*[@id="daterange"]/table
Expected output is a list of the "Close" Values from the different dates. As previously stated, current output is just "[ ]". I believe something may been incorrect in the form_data, perhaps the "submit".