0

I have a problem with extracting I-want-ya text from:

<div class="field">
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-ya</div>
</div>

The soulution up to the point:

browser = robobrowser.RoboBrowser(parser='html.parser')
browser.open(url)
browser = browser.parsed
soup = BeautifulSoup(str(browser), 'html.parser')

parsed_value = soup.select('div.labelx  + .input)

Is there a chance to get I-want-ya value:

  <div class="input">I-want-ya</div>

By the sibling with tag div that specifically has class="labelx" and child a with attribute title="Group"?

ap3x
  • 33
  • 7

2 Answers2

1

UPDATED: Now accounts for multiple matches

from bs4 import BeautifulSoup

s = '''<div class="field">
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-ya</div>
   <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
   <div class="input">I-want-you-2</div>
</div>'''

soup = BeautifulSoup(s, 'html.parser')

divs = soup.find_all('div', attrs={'class': 'labelx'})
for div in divs:
    try:
        div.find('a', {'title': 'Group'})
        print(div.findNext('div', {'class': 'input'}).text)
    except:
        print('No match.')

Gives:

I-want-ya
I-want-you-2
rahlf23
  • 8,869
  • 4
  • 24
  • 54
  • Needed only to add conditional statement: if div.find('a', {'title': 'Group}): and everything works perfectly. Thanks a lot :) – ap3x Mar 08 '18 at 21:40
0

Assuming that I understand you correctly:

  • Find the div element with the desired class.
  • Ask for all of its siblings, get the first of them, then get the text of that one.

>>> HTML = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...     <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_sib_div = soup.find('div', attrs={'class': 'labelx'})
>>> first_sib_div.fetchNextSiblings()[0].text
'I-want-ya'

Edit: This is what it should have been.

>>> HTML = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...     <div class="input">I-want-ya</div>
... </div>'''
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> first_div_link = soup.select('div.labelx > a[title="Group"]')[0]
>>> first_div_link.findParent().fetchNextSiblings()[0].text
'I-want-ya'

Addendum: Added in response to question from rahlf23.

>>> s = '''\
... <div class="field">
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>
...         <div class="input">I-want-ya</div>
...     <div class="labelx"><a class="clickme" href="#h_group123" rel="#h_group123" title="Group">* Group</a></div>     
...         <div class="input">I-want-ya-too</div>
... </div>'''
>>> soup = bs4.BeautifulSoup(s, 'lxml')
>>> for item in soup.select('div.labelx > a[title="Group"]'):
...     item.findParent().fetchNextSiblings()[0].text
...     
'I-want-ya'
'I-want-ya-too'
Bill Bell
  • 21,021
  • 5
  • 43
  • 58