Parsing child value by sibling and parent using Beautiful Soup

Question

I have a problem with extracting I-want-ya text from:

<div class="field">
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-ya</div>
</div>

The soulution up to the point:

browser = robobrowser.RoboBrowser(parser='html.parser')
browser.open(url)
browser = browser.parsed
soup = BeautifulSoup(str(browser), 'html.parser')

parsed_value = soup.select('div.labelx  + .input)

Is there a chance to get I-want-ya value:

  <div class="input">I-want-ya</div>

By the sibling with tag div that specifically has class="labelx" and child a with attribute title="Group"?

This appears to be a duplicate of: https://stackoverflow.com/questions/8936030/using-beautifulsoup-to-search-html-for-string — DarthOpto, Mar 08 '18 at 20:39
@DarthOpto: That question doesn't seem to involve searching by sibling, does it? — Bill Bell, Mar 08 '18 at 20:43

rahlf23 · Accepted Answer · 2018-03-08T20:57:41.200

UPDATED: Now accounts for multiple matches

from bs4 import BeautifulSoup

s = '''<div class="field">
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-ya</div>
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-you-2</div>
</div>'''

soup = BeautifulSoup(s, 'html.parser')

divs = soup.find_all('div', attrs={'class': 'labelx'})
for div in divs:
    try:
        div.find('a', {'title': 'Group'})
        print(div.findNext('div', {'class': 'input'}).text)
    except:
        print('No match.')

Gives:

I-want-ya
I-want-you-2

Needed only to add conditional statement: if div.find('a', {'title': 'Group}): and everything works perfectly. Thanks a lot :) — ap3x, Mar 08 '18 at 21:40

Bill Bell · Answer 2 · 2018-03-08T21:13:41.583

Assuming that I understand you correctly:

Find the div element with the desired class.
Ask for all of its siblings, get the first of them, then get the text of that one.

>>> HTML = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...     <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_sib_div = soup.find('div', attrs={'class': 'labelx'})
>>> first_sib_div.fetchNextSiblings()[0].text
'I-want-ya'

Edit: This is what it should have been.

>>> HTML = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...     <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_div_link = soup.select('div.labelx > a[title="Group"]')[0]
>>> first_div_link.findParent().fetchNextSiblings()[0].text
'I-want-ya'

Addendum: Added in response to question from rahlf23.

>>> s = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...         <div class="input">I-want-ya</div>
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>     
...         <div class="input">I-want-ya-too</div>
... </div>'''
>>> soup = bs4.BeautifulSoup(s, 'lxml')
>>> for item in soup.select('div.labelx > a[title="Group"]'):
...     item.findParent().fetchNextSiblings()[0].text
...     
'I-want-ya'
'I-want-ya-too'

This does not account for the requirement that it has a child `a` tag with `title="Group"` — rahlf23, Mar 08 '18 at 20:49
Works on my end! Curious for myself here, is there a `select_all()` equivalent that would work on the HTML sample I included in my answer? — rahlf23, Mar 08 '18 at 21:04

Parsing child value by sibling and parent using Beautiful Soup

2 Answers2