28

In BeautifulSoup, is there any difference between .text and .get_text()?

Which one should be preferred for getting element's text?

>>> from bs4 import BeautifulSoup
>>>
>>> html = "<div>text1 <span>text2</span><div>"
>>> soup = BeautifulSoup(html, "html.parser")
>>> div = soup.div
>>> div.text
'text1 text2'
>>> div.get_text()
'text1 text2'
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 3
    Basically you can use a custom separator using `get_text()`, and you should use it as `.text` is a private property and not even documented. – Selcuk Feb 19 '16 at 02:41
  • @Selcuk yeah, I am personally using `get_text()` all of the time mostly because it is explicitly documented, but I'm seeing a lot of bs4 users are using `.text` directly and got curious about the downsides of that. Thanks! – alecxe Feb 19 '16 at 02:42
  • 2
    ​​​​​​​​​​​​​​​Hmm...then what's `div.string`? – Remi Guan Feb 19 '16 at 02:45

1 Answers1

38

It looks like .text is just a property that calls get_text. Therefore, calling get_text without arguments is the same thing as .text. However, get_text can also support various keyword arguments to change how it behaves (separator, strip, types). If you need more control over the result, then you need the functional form.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • About the link, I think it's [line 296](http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/element.py#L296) – Momo Feb 12 '23 at 13:00
  • Yep. Probably moved around a bit since I created this post so many years ago. – mgilson Feb 14 '23 at 15:00