Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions

votes

4 answers

Scraping all URLs from search result page BeautifulSoup

I'm trying to get 100 URLs from the following search result page: https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/fahrraeder-radsport/fahrraeder-4552?rows=100&areaId=900 Here's the test code I have: import requests from bs4 import…

python web-scraping beautifulsoup

asked Dec 13 '20 at 13:49

scrape_noob

votes

4 answers

Extracting text from PDF url file with Python

I want to extract text from PDF file thats on one website. The website contains link to PDF doc, but when I click on that link it automaticaly downloads that file. Is it possible to extract text from that file without downloading it import fitz #…

python pdf beautifulsoup

asked Nov 24 '20 at 12:37

taga

3,537
13
53
119

votes

2 answers

Using Python to identify ETF holdings

I would like to create a web scraper that collects the specific holdings of an ETF. I found that Zacks.com creates a nice list of what I am looking for. I am trying to use BeautifulSoup however I am having a difficult time pinpointing the data under…

python beautifulsoup finance

asked Nov 19 '20 at 08:38

mshudoma

votes

2 answers

Extracting tag from bs4.element.tag returns empty string

I am trying to extract all of the answers from a Quora url following a tutorial. my code looks like this url = 'https://www.quora.com/Should-I-move-to-London' r = requests.get(url) soup = BeautifulSoup(r.content, 'html.parser') answers =…

python html json beautifulsoup

asked Nov 16 '20 at 16:08

linacarrillo

votes

2 answers

Parsing ::before with BS4

Tried parsing a web page. Faced ::before in Page html url = 'https://kant-sport.ru/sports/skiing/svobodnoe-katanie/' # Getting whole page page = get(url) # Making soup soup = BS(page.content, 'html.parser') # Getting table table =…

python parsing beautifulsoup request

asked Nov 15 '20 at 10:38

TimNekk

votes

1 answer

Getting names from a website with a list doesn't always work

I have the following…

python beautifulsoup request

asked Nov 02 '20 at 11:52

Greenyuno

votes

1 answer

How Scraping Dynamic Variable Javascript value using BeautifulSoup and Requests

I am scraping login page, i only need VAR SALT= variable in JAVASCRIPT TAG. This is the website = https://ib.muamalatbank.com/ib-app/loginpage When i am read all answer here,using BeautifulSoup and requests, i can get these 2 variable(Maybe because…

javascript python beautifulsoup

asked Oct 26 '20 at 23:57

Neo

votes

2 answers

Unable to grab div tag in Beautiful Soup in Python,

I'm trying to download all the pokemon images available on the official website. The reason I'm doing this is because I want high quality images. Following is the code that I wrote. from bs4 import BeautifulSoup as bs4 import requests request =…

python beautifulsoup

asked Oct 24 '20 at 05:54

Lawhatre

1,302
2
10
28

votes

0 answers

Dependency hell with beautifulsoup4 and lxml

I have built a small utility using Python 3.8. Among other things it extracts some data from XML files using beautifulsoup4 and lxml. I use PyCharm and virtualenv for development and my utility works just fine. In order to distribute the util to…

python beautifulsoup pip zipapp

asked Oct 19 '20 at 14:08

Robert Petermeier

4,122
4
29
37

votes

6 answers

how can I locate the highlighted element in the picture. I use selenium

I am not sure why I can't locate this element, I am using selenium because the pages loads dynamically. here is my…

python selenium beautifulsoup

asked Oct 06 '20 at 06:18

Talib Daryabi

votes

2 answers

unable to parse html table with Beautiful Soup

I am very new to using Beautiful Soup and I'm trying to import data from the below url as a pandas dataframe. However, the final result has the correct columns names, but no numbers for the rows. What should I be doing instead? Here is my code: from…

python html pandas parsing beautifulsoup

asked Oct 04 '20 at 18:38

Jojo

votes

1 answer

Soup not locating proper div tag when searched by text

This is condensed version of the actual html which has many more tags. html = '''

…

python-3.x regex beautifulsoup

asked Sep 30 '20 at 21:35

MasayoMusic

votes

1 answer

is there a way to scrape a JavaScript page without selenium in python

is there a way to scrape JS-rendered web page with python beautifulsoup or lxml without selenium? thanx

python web-scraping beautifulsoup

asked Sep 29 '20 at 12:03

Kemal Ebubekir Atabey

votes

1 answer

BS4 Pagination Results Limited: 10 Rather Than 200

Instead of the output being 10 links on each page, it is only returning the ten links on the last page. In other words, if this was working, the total number of links would be 200. from goose3 import Goose from bs4 import BeautifulSoup from…

python beautifulsoup

asked Sep 24 '20 at 02:30

zbush548

votes

1 answer

How to Scrape Answer from Quizlet Flashcard? BS4 and Requests

Using this page as an example: https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/ How would one hypothetically scrape the text answer from behind the flashcard? It's hidden right now, but when you click on it, it rotates and…

python-3.x beautifulsoup python-requests

asked Sep 21 '20 at 22:45

wildcat89

1,159
16
47

Prev 1 2 3

…

100