Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions
3
votes
1 answer

How to fetch/scrape all elements from a html "class" which is inside "span"?

I am trying to scrape data from a website where i am collecting data from all elements under "class" which is inside "span" using this piece of code. But i am ending up in fetching only one element instead of all. expand_hits = soup.findAll("a",…
Revanth Tv
  • 63
  • 8
3
votes
2 answers

Scrape everything between two unested tags

Is it possible to scrape everything between two unested tags ? For instance:

Title 1

span1
user13006535

3
votes
6 answers

How to avoid being banned while scraping data from a login based site?

I'm trying to create a script using which I can parse few fields from a website without getting blocked. The site I wish to get data from requires credentials to access it's content. If it were not for login thing, I could have bypassed the rate…
SMTH
  • 67
  • 1
  • 4
  • 17
3
votes
2 answers

convert website table to pandas df (beautifulsoup doesn't recognize table)

I want to convert a website table to pandas df, but BeautifulSoup doesn't recognize the table (snipped image below). Below is the code I tried with no luck. from bs4 import BeautifulSoup import requests import pandas as pd url =…
user2031063
  • 947
  • 1
  • 7
  • 11
3
votes
1 answer

How can I get namespace information from tag in beautifulsoup4?

I am trying to parse some xml files that strongly make use of namespaces. Right now I am using beautifulsoup4 and for the most part things are going well. Unfortunately I am running into some data where it is possible that some tags may have the…
3
votes
1 answer

Read table from Web using Python

I'm new to Python and am working to extract data from website https://www.screener.in/company/ABB/consolidated/ on a particular table (the last table which is Shareholding Pattern) I'm using BeautifulSoup library for this but I do not know how to go…
Manny
  • 41
  • 5
3
votes
1 answer

Noticing a warning to limit scraped results with BeautifulSoup in Python

I am trying to scrape sales data from eBay with BeautifulSoup in Python for recently sold items and it works very well with the following code which finds all prices and all dates from sold items. price = [] try: p =…
lang0s
  • 31
  • 4
3
votes
1 answer

Webscraping website search bars with python

I am trying to write some code for a personal project where i can scrape data from a site while also using that site's query box. Furthermore, the website i am trying to use is https://www.latlong.net/convert-address-to-lat-long.html and I am trying…
m_ess4
  • 31
  • 2
3
votes
2 answers

Extract data from