-1

I'm trying to scrape some stock prices, and variations, from Google Finance using python3 but I just can't figure out if there's something wrong with the page, or my regex. I'm thinking that either the svg graphic or the many script tags throughout the page are making the regex parsers fail to properly analyze the code.

I have tested this regex on many online regex builders/testers and it looks ok. As ok as a regex designed for HTML can be, anyway.

The Google Finance page I'm testing this out on is https://www.google.com/finance?q=NYSE%3AAAPL And my python code is the following

import urllib.request
import re
page = urllib.request.urlopen('https://www.google.com/finance?q=NYSE%3AAAPL')
text = page.read().decode('utf-8')
m = re.search("id=\"price-panel.*>(\d*\d*\d\.\d\d)</span>.*\((-*\d\.\d\d%)\)", text, re.S)
print(m.groups())

It would extract the stock price and its percent variation. I have also tried using python2 + BeautifulSoup, like so

soup.find(id='price-panel')

but it returns empty even for this simple query. This is especially why I'm thinking that there's something weird with the html.

And here's the most important bit of html that I'm aiming for

<div id="price-panel" class="id-price-panel goog-inline-block">
<div>
<span class="pr">
<span class="unchanged" id="ref_22144_l"><span class="unchanged">96.41</span><span></span></span>
</span>
<div class="id-price-change nwp goog-inline-block">
<span class="ch bld"><span class="down" id="ref_22144_c">-1.13</span>
<span class="down" id="ref_22144_cp">(-1.16%)</span>
</span>
</div>
</div>
<div>
<span class="nwp">
Real-time:
&nbsp;
<span class="unchanged" id="ref_22144_ltt">3:42PM EDT</span>
</span>
<div class="mdata-dis">
<span class="dis-large"><nobr>NASDAQ
real-time data -
<a href="//www.google.com/help/stock_disclaimer.html#realtime" class="dis-large">Disclaimer</a>
</nobr></span>
<div>Currency in USD</div>
</div>
</div>
</div>

I'm wondering if any of you have encountered a similar problem with this page and/or can figure out if there's anything wrong with my code. Thanks in advance!

Slpk
  • 23
  • 7
  • 1
    FYI, https://www.quandl.com/help/api-for-stock-data I don't know what you need from Google Finance, but you can probably get it from here. – user2023861 Oct 16 '14 at 20:50
  • @user2023861 Thanks, I'll check it out. I had searched for other sources of data but none that I found had all the stocks I needed. I'm trying to get stocks from exchanges other than NYSE. – Slpk Oct 17 '14 at 14:01

2 Answers2

0

You might try a different URL that will be easier to parse, such as: http://www.google.com/finance/info?q=AAPL

The catch is that Google has said that using this API in an application for public consumption is against their Terms of Service. Maybe there is an alternative that Google will allow you to use?

  • Cool, it certainly is much better than parsing html. The other hidden source I found useful is https://www.google.com/finance/getprices?q=AAPL&i=120&p=5d&f=c&df=cpct&auto=1 – Slpk Oct 17 '14 at 14:04
0

I managed to get it working using BeautifulSoup, on the link posted originally.

Here's the bit of code I finaly used:

response = urllib2.urlopen('https://www.google.com/finance?q=NYSE%3AAAPL')
html = response.read()
soup = BeautifulSoup(html, "lxml")
aaplPrice = soup.find(id='price-panel').div.span.span.text
aaplVar = soup.find(id='price-panel').div.div.span.find_all('span')[1].string.split('(')[1].split(')')[0]
aapl = aaplPrice + ' ' + aaplVar

I couldn't get it working with BeautifulSoup before because I was actually trying to parse the table in this page https://www.google.com/finance?q=NYSE%3AAAPL%3BNYSE%3AGOOG, not the one I posted. Neither method described on my question has worked on this page.

Slpk
  • 23
  • 7