Get text of children in a div with beautifulsoup

Question

Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).

I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"

Anyone can help me?

Martijn Pieters · Answer 1 · 2021-10-29T12:25:00.903

39

Use the .text attribute on the elements; you have a list of results, so loop:

for res in result:
    print(res.text)

.text is a property that proxies for the Element.get_text() method.

Alternatively, if there is only ever supposed to be one such <div>, use .find() instead of .find_all():

result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)

edited Oct 29 '21 at 12:25

answered Jan 02 '14 at 18:56

Martijn Pieters

1,048,767
296
4,058
3,343

Note that according to the documentation, that property does not exist. However, the [get_text()](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text) function does. – Mike 'Pomax' Kamermans Oct 11 '21 at 16:00
@Mike'Pomax'Kamermans that’s a documentation bug. `.text` is a [property that calls `.get_text()`](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/614/bs4/element.py#L296). – Martijn Pieters Oct 12 '21 at 02:14
hm, I don't see a bug for that over on https://bugs.launchpad.net/beautifulsoup/, and it's a pretty well written section so... if it is, it might still be good to talk about `get_text()` as "the thing" and `.text` as a convenient shortcut to it - if folks want to find out more about this property, they're not going to find it in the docs by searching for `.text`, whereas they _are_ going to find things by searching for `get_text`. – Mike 'Pomax' Kamermans Oct 12 '21 at 04:12
@Mike'Pomax'Kamermans: fair enough, added. – Martijn Pieters Oct 29 '21 at 12:25

score 1 · Answer 2 · answered Dec 06 '21 at 14:13

Use decode_contents() method.

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

for res in result:
    print(res.decode_contents().strip())

You'll get the innerHTML from div.

score 1 · Answer 3 · answered Jul 07 '22 at 09:46

1

If wanting to extract text from all elements into a list, a list comprehension can come in handy:

texts = [r.text.strip() for r in results]

answered Jul 07 '22 at 09:46

verwirrt

125
1
9

Get text of children in a div with beautifulsoup

3 Answers3

Linked