9

There are two values that i am looking to scrape from a website. These are present in the following tags:

<span class="sp starBig">4.1</span>
<span class="sp starGryB">2.9</span>

I need the values sp starBig, sp starGryB.

The findAll expression that i am using is -

soup.findAll('span', {'class': ['sp starGryB', 'sp starBig']}):

The code gets executed without any errors yet no results get displayed.

famousgarkin
  • 13,687
  • 5
  • 58
  • 74
RDPD
  • 555
  • 3
  • 8
  • 18

3 Answers3

10

As per the docs, assuming Beautiful Soup 4, matching for multiple CSS classes with strings like 'sp starGryB' is brittle and should not be done:

soup.find_all('span', {'class': 'sp starGryB'})
# [<span class="sp starGryB">2.9</span>]
soup.find_all('span', {'class': 'starGryB sp'})
# []

CSS selectors should be used instead, like so:

soup.select('span.sp.starGryB')
# [<span class="sp starGryB">2.9</span>]
soup.select('span.starGryB.sp')
# [<span class="sp starGryB">2.9</span>]

In your case:

items = soup.select('span.sp.starGryB') + soup.select('span.sp.starBig')

or something more sophisticated like:

items = [i for s in ['span.sp.starGryB', 'span.sp.starBig'] for i in soup.select(s)]
famousgarkin
  • 13,687
  • 5
  • 58
  • 74
  • items = [i for s in ['span.sp.starGryB', 'span.sp.starBig'] for i in soup.select(s): try: print(i.string) except KeyError: pass – RDPD Apr 26 '15 at 14:00
  • items = soup.select('span.sp.starGryB') + soup.select('span.sp.starBig') is working. – RDPD Apr 26 '15 at 14:07
  • @Dixon The second option is just using a [list comprehension](https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions), the expression inside and including `[]`, not a standard for loop. Removed the line split to hopefully improve clarity. – famousgarkin Apr 26 '15 at 14:42
2

Probably there is a better way, but it is eluding me at present. It can be done with css selectors like this:

html = '''<span class="sp starBig">4.1</span>
          <span class="sp starGryB">2.9</span>
          <span class="sp starBig">22</span>'''

soup = bs4.BeautifulSoup(html)

selectors = ['span.sp.starBig', 'span.sp.starGryB']
result = []
for s in selectors:
    result.extend(soup.select(s))
mhawke
  • 84,695
  • 9
  • 117
  • 138
0

soup.findAll('span', {'class': ['sp starGryB', 'sp starBig']}) this code is helpful and it's work very good with me

  • 1
    Hi, how is this different from the code in the question? – no ai please Oct 08 '21 at 02:19
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 08 '21 at 03:35