Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions
151
votes
11 answers

How to scrape only visible webpage text with BeautifulSoup?

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the body text (article) and maybe even a few tab names here and there. I have tried the…
user233864
  • 1,727
  • 2
  • 13
  • 12
137
votes
6 answers

python BeautifulSoup parsing table

I'm learning python requests and BeautifulSoup. For an exercise, I've chosen to write a quick NYC parking ticket parser. I am able to get an html response which is quite ugly. I need to grab the lineItemsTable and parse all the tickets. You can…
Cmag
  • 14,946
  • 25
  • 89
  • 140
126
votes
9 answers

How to find tags with only certain attributes - BeautifulSoup

How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for? For example, I want to find all tags. The following code: raw_card_data = soup.fetch('td', {'valign':re.compile('top')}) gets all of…
Snaxib
  • 1,620
  • 3
  • 14
  • 14
122
votes
3 answers

Can I remove script tags with BeautifulSoup?

Can