I am trying to get sector classifications for tickers using python. How would I go about scraping?
This is a function I designed but I was wondering if there is a better way to go about it.
def get_sector(ticker):
req=urllib2.Request('http://google.com/finance?q='+ str(ticker))
response = urllib2.urlopen(req)
the_page = response.read()
output = re.search('Sector\: \<a id=sector href=\"(.*)\" \>(.*)\<\/a\>\>', the_page, flags=re.IGNORECASE)
if output != None:
output = output.group(2)
output= HTMLParser.HTMLParser().unescape(output)
return output
else:
return 'Not Found'
I am receiving the following error when I try iterating it over the list of tickers in Russell 3000:
URLError: