0

I'm running python 3.3 on windows. The code below goes to yahoo finance and pulls the stock price and prints it. The problem I'm running into is that it outputs:

['540.04']

I just want the number so I can turn it into a float and use it with formulas. I tried just using the float function, but that didn't work. I think I have to somehow remove the brackets and apostrophes with some line of code.

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    import re

    htmlfile = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")

    Thefind = re.compile ('<span id="yfs_l84_aapl">(.+?)</span>')

    msg=htmlfile.read()

    price = Thefind.findall(str(msg))

    print (price)
user2859603
  • 235
  • 4
  • 9
  • 18

3 Answers3

0

The beautiful thing about BeautifulSoup is that you don't have to use regexp to parse HTML data.

This is the correct way of using BS:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)
my_span = soup.find('span', {'id': 'yfs_l84_aapl'})
print(my_span.text)

Which yields

540.04
Steinar Lima
  • 7,644
  • 2
  • 39
  • 40
  • I ended with the output: None , Something got messed up. I used print (my_span) because it didn't recognize my_span.text. – user2859603 Jan 08 '14 at 03:23
  • I got this error AttributeError: 'NoneType' object has no attribute 'string' I did a google search and it seems it has something to do with the output>> None – user2859603 Jan 08 '14 at 03:36
  • I found out the true problem was that when I wrote l84, I used a one instead of a lowercase "L". They look so a like in Idle. I feel bad now. I tried string as well. They both work perfectly. Thank you for the assistance. I'll be looking into Beautiful Soup's capabilities. – user2859603 Jan 08 '14 at 04:02
  • That's good! :) I've tried parsing HTML with regexp before, and believe me - you wont regret using BS instead! It's like heaven compared to hell – Steinar Lima Jan 08 '14 at 04:04
0

The function findall() returns a list. If you just want the first group, pick it like this:

Thefind.findall(msg)[0]

But referring to any group is done cleaner like this:

Thefind.match(msg).group(1)

Note: group(0) is the whole match, not the first group.

Robert Siemer
  • 32,405
  • 11
  • 84
  • 94
  • doc sys it's a list and not an array. re.findall(pattern, string[, flags]) Return all non-overlapping matches of pattern in string, as a list of strings. – ThatOneDude Jan 08 '14 at 03:24
-1

Use Python built-in functions float(price.strip("[']"))

chazkii
  • 1,300
  • 12
  • 21