4

I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib package and then using regex to get price data.

When I leave my python script running, it works initially for some time (few minutes) and then starts throwing exception [HTTP Error 503: Service Unavailable]

I guess this is happening because on web server side it detects frequent page updates as a robot and throws this exception after a while..

is there a way around this, i.e. deleting some cookie or creating some cookie etc..

or even better if google gives some api, I want to do this in python because the complete app in python, but if there is nothing available in python to do this, I can consider alternatives. This is my python method that I use in loop to get data ( with few seconds of sleep I call this method in loop)

 def getPriceFromGOOGLE(self, symbol):
    """ 
    gets last traded price from google for given security
    """         
    toReturn = 0.0
    try:
        base_url = 'http://google.com/finance?q='
        req = urllib2.Request(base_url + symbol)
        content = urllib2.urlopen(req).read()
        namestr = 'name:\"' + symbol + '\",cp:(.*),p:(.*),cid(.*)}'
        m = re.search(namestr, content)
        if m:
            data = str(m.group(2).strip().strip('"'))
            price = data.replace(',','')
            toReturn = float(price)
        else:
            print 'ERROR ' + str(symbol) + ' --- ' + str(content)      
    except Exception, exc:
        print 'Exc: ' + str(exc)       
    finally: 
        return toReturn
user424060
  • 1,545
  • 3
  • 20
  • 29

4 Answers4

5

The question is quite old but the selected answer is not valid anymore.
The API has been deprecated.

There is an open source project to scrape all companies from Google finance and match them with their current price at http://scrape-google-finance.compunect.com/
The project solved most issues, includes caching, IP management and works stable without getting blocked.
It uses the internal finance company matching api to scrape companies and the chart api to get prices. However it is php code, not python. You can still learn how it solved the tasks and adapt it.

John
  • 7,507
  • 3
  • 52
  • 52
  • Good answer, but the code is written for use with us-proxies.com , a professional IP address provider. Apparently the scrapers use different IP addresses to avoid getting shut out. And that site charges approx $30/month for five IP addresses, $145/month for 30 IP addresses. – zipzit Jun 22 '14 at 12:27
  • You are right, the code is open source, if you have a cheaper/own IP solution you want ot try just take the parts you need. For google finance you won't need many IPs, depending on what you exactly want to do. – John Jun 22 '14 at 19:12
3

To get around most rate-limiting or bot-detection from the likes of Google or Wikipedia or Yahoo, spoof your user-agent.

This will make your script's requests appear to be from the latest version of Google Chrome.

headers = {'User-Agent' : "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.16 Safari/534.24"}
req = urllib2.Request(url,None,headers)
content = urllib2.urlopen(req).read()
Aphex
  • 7,390
  • 5
  • 33
  • 54
3

Yahoo Finance is also a good place to get financial information which covers more countries and stocks.

For python 2, you can use ystockquote. For python 3, you can use yfq that I rewrite from the previous one.

To get current quotes of Google and Intel.

>>> import yfq
>>> yfq.get_price('GOOG+INTL')
{'GOOG': '600.25', 'INTL': '22.25'}

To get historical quotes of Yahoo from March 3, 2012 to March 5, 2012.

>>> yfq.get_historical_prices('YHOO','20120301','20120303')
[['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], ['2012-03-02', '14.89', '14.92', '14.66', '14.72', '9164900', '14.72'], ['2012-03-01', '14.89', '14.96', '14.79', '14.93', '12283300', '14.93']]
angelo
  • 31
  • 1
  • `yfq` -> 404, ` ystockquote` -> site can't be reached. There exists a [`yfinance`](https://github.com/ranaroussi/yfinance) package for it that kind of actively maintained. – Dmitriy Zub Apr 15 '22 at 08:16
2

There is a Google Finance API:

http://code.google.com/apis/finance/docs/2.0/developers_guide_protocol.html

And there is a Python client library for it:

http://code.google.com/p/gdata-python-client/

AJ.
  • 27,586
  • 18
  • 84
  • 94
  • 6
    The Google Finance API has been officially deprecated as of May 26, 2011 and will be shut down on October 20, 2012. :( https://developers.google.com/finance/ – gliptak Oct 14 '12 at 16:51