Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions
3
votes
4 answers

Scraping all URLs from search result page BeautifulSoup

I'm trying to get 100 URLs from the following search result page: https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/fahrraeder-radsport/fahrraeder-4552?rows=100&areaId=900 Here's the test code I have: import requests from bs4 import…
3
votes
4 answers

Extracting text from PDF url file with Python

I want to extract text from PDF file thats on one website. The website contains link to PDF doc, but when I click on that link it automaticaly downloads that file. Is it possible to extract text from that file without downloading it import fitz #…
taga
  • 3,537
  • 13
  • 53
  • 119
3
votes
2 answers

Using Python to identify ETF holdings

I would like to create a web scraper that collects the specific holdings of an ETF. I found that Zacks.com creates a nice list of what I am looking for. I am trying to use BeautifulSoup however I am having a difficult time pinpointing the data under…
mshudoma
  • 41
  • 1
  • 4
3
votes
2 answers

Extracting tag from bs4.element.tag returns empty string

I am trying to extract all of the answers from a Quora url following a tutorial. my code looks like this url = 'https://www.quora.com/Should-I-move-to-London' r = requests.get(url) soup = BeautifulSoup(r.content, 'html.parser') answers =…
3
votes
2 answers

Parsing ::before with BS4

Tried parsing a web page. Faced ::before in Page html url = 'https://kant-sport.ru/sports/skiing/svobodnoe-katanie/' # Getting whole page page = get(url) # Making soup soup = BS(page.content, 'html.parser') # Getting table table =…
TimNekk
  • 33
  • 4
3
votes
1 answer
3
votes
1 answer

How Scraping Dynamic Variable Javascript value using BeautifulSoup and Requests

I am scraping login page, i only need VAR SALT= variable in JAVASCRIPT TAG. This is the website = https://ib.muamalatbank.com/ib-app/loginpage When i am read all answer here,using BeautifulSoup and requests, i can get these 2 variable(Maybe because…
Neo
  • 31
  • 1
3
votes
2 answers

Unable to grab div tag in Beautiful Soup in Python,

I'm trying to download all the pokemon images available on the official website. The reason I'm doing this is because I want high quality images. Following is the code that I wrote. from bs4 import BeautifulSoup as bs4 import requests request =…
Lawhatre
  • 1,302
  • 2
  • 10
  • 28
3
votes
0 answers

Dependency hell with beautifulsoup4 and lxml

I have built a small utility using Python 3.8. Among other things it extracts some data from XML files using beautifulsoup4 and lxml. I use PyCharm and virtualenv for development and my utility works just fine. In order to distribute the util to…
Robert Petermeier
  • 4,122
  • 4
  • 29
  • 37
3
votes
6 answers

how can I locate the highlighted element in the picture. I use selenium

I am not sure why I can't locate this element, I am using selenium because the pages loads dynamically. here is my…
Talib Daryabi
  • 733
  • 1
  • 6
  • 28
3
votes
2 answers

unable to parse html table with Beautiful Soup

I am very new to using Beautiful Soup and I'm trying to import data from the below url as a pandas dataframe. However, the final result has the correct columns names, but no numbers for the rows. What should I be doing instead? Here is my code: from…
Jojo
  • 33
  • 3
3
votes
1 answer

Soup not locating proper div tag when searched by text

This is condensed version of the actual html which has many more tags. html = '''
MasayoMusic
  • 594
  • 1
  • 6
  • 24
3
votes
1 answer

is there a way to scrape a JavaScript page without selenium in python

is there a way to scrape JS-rendered web page with python beautifulsoup or lxml without selenium? thanx
3
votes
1 answer

BS4 Pagination Results Limited: 10 Rather Than 200

Instead of the output being 10 links on each page, it is only returning the ten links on the last page. In other words, if this was working, the total number of links would be 200. from goose3 import Goose from bs4 import BeautifulSoup from…
zbush548
  • 244
  • 1
  • 10
3
votes
1 answer

How to Scrape Answer from Quizlet Flashcard? BS4 and Requests

Using this page as an example: https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/ How would one hypothetically scrape the text answer from behind the flashcard? It's hidden right now, but when you click on it, it rotates and…
wildcat89
  • 1,159
  • 16
  • 47
1 2 3
99
100