Taking specific text from a div in python3

Question

here is an example of the html I am trying to extract from:

    <div class="small subtle link">                      
                    <a href="https://example.com" target=&quot;_blank&quot;  nofollow >Example</a>
                

                
                     This text!
            </div>

I want to grab "This text!" but I keep getting "Example" with it when I do this

                myText=soup.findAll('div',{'class':re.compile('small subtle link')})
        if myText: 
            extractedText=myText.text.strip()

How do I leave out the text that is in the a tag?

Does this answer your question? [Only extracting text from this element, not its children](https://stackoverflow.com/questions/4995116/only-extracting-text-from-this-element-not-its-children) — AMC, Nov 04 '20 at 01:09

AMC · Accepted Answer · 2020-11-04T02:10:15.403

There are a few possible solutions, it all depends on the exact behaviour you're looking for.

This produces the correct output:

from bs4 import BeautifulSoup

html_src = \
    '''
    <html>
    <body>
    <div class="small subtle link">
        <a href="https://example.com" nofollow="" target='"_blank"'>
            Example
        </a>
        This text!
    </div>
    </body>
    </html>
    '''

soup = BeautifulSoup(html_src, 'lxml')
print(soup.prettify())

div_tag = soup.find(name='div', attrs={'class': 'small subtle link'})

div_content_text = []
for curr_text in div_tag.find_all(recursive=False, text=True):
    curr_text = curr_text.strip()
    if curr_text:
        div_content_text.append(curr_text)

print(div_content_text)

Edit: The solution by Sushil is quite clean, too.

score 0 · Answer 2 · answered Nov 03 '20 at 23:56

0

This is what you need:

soup.div.find(text=True, recursive=False)

answered Nov 03 '20 at 23:56

IoaTzimas

10,538
2
13
30

this still grabs the "Example" – soapy Nov 04 '20 at 00:17

score 0 · Answer 3 · answered Nov 04 '20 at 01:26

You can try this:

print(div.a.find_next_sibling(text=True).strip())

This finds the a tag under the div and prints the text that comes after it.

Here is the full code:

from bs4 import BeautifulSoup

html = """
<div class="small subtle link">                      
                    <a href="https://example.com" target=&quot;_blank&quot;  nofollow >Example</a>
                

                
                     This text!
            </div>
"""

soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "small subtle link")

print(div.a.find_next_sibling(text=True).strip())

Output:

This text!

Taking specific text from a div in python3

3 Answers3