Python webscraper outputs brackets with the number

Question

I'm running python 3.3 on windows. The code below goes to yahoo finance and pulls the stock price and prints it. The problem I'm running into is that it outputs:

['540.04']

I just want the number so I can turn it into a float and use it with formulas. I tried just using the float function, but that didn't work. I think I have to somehow remove the brackets and apostrophes with some line of code.

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    import re

    htmlfile = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")

    Thefind = re.compile ('<span id="yfs_l84_aapl">(.+?)</span>')

    msg=htmlfile.read()

    price = Thefind.findall(str(msg))

    print (price)

Steinar Lima · Accepted Answer · 2014-01-08T03:46:49.787

0

The beautiful thing about BeautifulSoup is that you don't have to use regexp to parse HTML data.

This is the correct way of using BS:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)
my_span = soup.find('span', {'id': 'yfs_l84_aapl'})
print(my_span.text)

Which yields

540.04

edited Jan 08 '14 at 03:46

answered Jan 08 '14 at 03:08

Steinar Lima

7,644
2
39
40

I ended with the output: None , Something got messed up. I used print (my_span) because it didn't recognize my_span.text. – user2859603 Jan 08 '14 at 03:23
I got this error AttributeError: 'NoneType' object has no attribute 'string' I did a google search and it seems it has something to do with the output>> None – user2859603 Jan 08 '14 at 03:36
I found out the true problem was that when I wrote l84, I used a one instead of a lowercase "L". They look so a like in Idle. I feel bad now. I tried string as well. They both work perfectly. Thank you for the assistance. I'll be looking into Beautiful Soup's capabilities. – user2859603 Jan 08 '14 at 04:02
That's good! :) I've tried parsing HTML with regexp before, and believe me - you wont regret using BS instead! It's like heaven compared to hell – Steinar Lima Jan 08 '14 at 04:04

Robert Siemer · Answer 2 · 2014-01-08T03:33:18.673

0

The function findall() returns a list. If you just want the first group, pick it like this:

Thefind.findall(msg)[0]

But referring to any group is done cleaner like this:

Thefind.match(msg).group(1)

Note: group(0) is the whole match, not the first group.

edited Jan 08 '14 at 03:33

answered Jan 08 '14 at 03:12

Robert Siemer

32,405
11
84
94

doc sys it's a list and not an array. re.findall(pattern, string[, flags]) Return all non-overlapping matches of pattern in string, as a list of strings. – ThatOneDude Jan 08 '14 at 03:24

score -1 · Answer 3 · answered Jan 08 '14 at 03:03

-1

Use Python built-in functions float(price.strip("[']"))

answered Jan 08 '14 at 03:03

chazkii

1,300
12
21

It didn't work out, because the number is in a list, but I learned a new function. – user2859603 Jan 08 '14 at 03:26

Python webscraper outputs brackets with the number

3 Answers3