14

Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).

I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"

Anyone can help me?

Axisnix
  • 2,822
  • 5
  • 19
  • 41
Si Mon
  • 163
  • 1
  • 1
  • 6

3 Answers3

39

Use the .text attribute on the elements; you have a list of results, so loop:

for res in result:
    print(res.text)

.text is a property that proxies for the Element.get_text() method.

Alternatively, if there is only ever supposed to be one such <div>, use .find() instead of .find_all():

result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Note that according to the documentation, that property does not exist. However, the [get_text()](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text) function does. – Mike 'Pomax' Kamermans Oct 11 '21 at 16:00
  • @Mike'Pomax'Kamermans that’s a documentation bug. `.text` is a [property that calls `.get_text()`](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/614/bs4/element.py#L296). – Martijn Pieters Oct 12 '21 at 02:14
  • hm, I don't see a bug for that over on https://bugs.launchpad.net/beautifulsoup/, and it's a pretty well written section so... if it is, it might still be good to talk about `get_text()` as "the thing" and `.text` as a convenient shortcut to it - if folks want to find out more about this property, they're not going to find it in the docs by searching for `.text`, whereas they _are_ going to find things by searching for `get_text`. – Mike 'Pomax' Kamermans Oct 12 '21 at 04:12
  • @Mike'Pomax'Kamermans: fair enough, added. – Martijn Pieters Oct 29 '21 at 12:25
1

Use decode_contents() method.

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

for res in result:
    print(res.decode_contents().strip())

You'll get the innerHTML from div.

Mowshon
  • 939
  • 9
  • 16
1

If wanting to extract text from all elements into a list, a list comprehension can come in handy:

texts = [r.text.strip() for r in results]
verwirrt
  • 125
  • 1
  • 9