Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions

votes

18 answers

Converting html to text with Python

I am trying to convert an html block to text using Python. Input:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean…

asked Feb 04 '13 at 19:55

Aaron Bandelli

1,238
2
14
16

votes

4 answers

Using BeautifulSoup to search HTML for string

I am using BeautifulSoup to look for user-entered strings on a specific page. For example, I want to see if the string 'Python' is located on the page: http://python.org When I used: find_string = soup.body.findAll(text='Python'), find_string…

python beautifulsoup

asked Jan 20 '12 at 02:18

kachilous

2,499
11
42
56

votes

8 answers

BeautifulSoup innerhtml?

Let's say I have a page with a div. I can easily get that div with soup.find(). Now that I have the result, I'd like to print the WHOLE innerhtml of that div: I mean, I'd need a string with ALL the html tags and text all toegether, exactly like the…

python html beautifulsoup innerhtml

asked Nov 13 '11 at 16:26

Matteo Monti

8,362
19
68
114

votes

1 answer

Beautifulsoup : Difference between .find() and .select()

When you use BeautifulSoup to scrape a certain part of a website, you can use soup.find() and soup.findAll() or soup.select(). Is there a difference between the .find() and the .select() methods? (e.g. In performance or flexibility, etc.) Or are…

python python-3.x beautifulsoup

asked Jun 25 '16 at 12:09

Dieter

2,499
1
23
41

votes

3 answers

BeautifulSoup getText from between
, not picking up subsequent paragraphs

Firstly, I am a complete newbie when it comes to Python. However, I have written a piece of code to look at an RSS feed, open the link and extract the text from the article. This is what I have so far: from BeautifulSoup import BeautifulSoup import…

python python-2.7 beautifulsoup

asked Sep 17 '12 at 00:52

Darren Wadley

votes

3 answers

Using BeautifulSoup to find a HTML tag that contains certain text

I'm trying to get the elements in an HTML doc that contain the following pattern of text: #\S{11}

this is cool #12345678901

So, the previous would match by using: soup('h2',text=re.compile(r' #\S{11}')) And the results would be…

python regex beautifulsoup html-content-extraction

asked May 14 '09 at 21:46

sotangochips

2,700
6
28
38

votes

6 answers

What should I use to open a url instead of urlopen in urllib3

I wanted to write a piece of code like the following: from bs4 import BeautifulSoup import urllib2 url = 'http://www.thefamouspeople.com/singers.php' html = urllib2.urlopen(url) soup = BeautifulSoup(html) But I found that I have to install urllib3…

python web-scraping beautifulsoup urllib3

asked Apr 09 '16 at 11:33

niloofar

2,244
5
23
44

votes

4 answers

How to get rid of BeautifulSoup user warning?

After I installed BeautifulSoup, whenever I run my Python in from the command line, this warning comes out: D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166: UserWarning: No parser was explicitly specified,…

python beautifulsoup

asked Nov 04 '15 at 00:13

jellyfishhuang

votes

3 answers

Python BeautifulSoup give multiple tags to findAll

I'm looking for a way to use findAll to get two tags, in the order they appear on the page. Currently I have: import requests import BeautifulSoup def get_soup(url): request = requests.get(url) page = request.text soup =…

python beautifulsoup

asked Dec 18 '13 at 02:28

DasSnipez

2,182
4
20
29

votes

2 answers

UnicodeEncodeError: 'ascii' codec can't encode character at special name

My python (ver 2.7) script is running well to get some company name from local html files but when it comes to some specific country name, it gives this error "UnicodeEncodeError: 'ascii' codec can't encode character" Specially getting error when…

python unicode encoding beautifulsoup ascii

asked Jun 30 '15 at 11:53

rhb1

votes

4 answers

Extract the 'src' attribute from an 'img' tag using Beautiful Soup

Consider:

I want to extract the source (i.e., src) attribute from an image (i.e., img) tag using Beautiful Soup. I use Beautiful Soup 4, and I cannot…

python regex beautifulsoup

asked May 15 '17 at 14:25

iDelusion

votes

8 answers

beautifulsoup, html5lib: module object has no attribute _base

When I updated my packages I have this new error: class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder): AttributeError: 'module' object has no attribute '_base' I tried to update beautifulsoup, with no more result. How can I fix…

beautifulsoup html5lib

asked Jul 19 '16 at 00:14

Ehvince

17,274
7
58
79

votes

5 answers

Get meta tag content property with BeautifulSoup and Python

I am trying to use python and beautiful soup to extract the content part of the tags below: I'm…

python html web-scraping beautifulsoup

asked Apr 21 '16 at 11:22

the_t_test_1

1,193
1
12
28

votes

4 answers

Using BeautifulSoup to extract text without tags

My webpage looks like this:

YOB: 1987
RACE: WHITE
GENDER: FEMALE
HEIGHT: 5'05''
…

python web-scraping beautifulsoup

asked Apr 30 '14 at 05:15

myloginid

1,463
2
22
37

votes

3 answers

How to write the output to html file with Python BeautifulSoup

I modified an html file by removing some of the tags using beautifulsoup. Now I want to write the results back in a html file. My code: from bs4 import BeautifulSoup from bs4 import Comment soup =…

python html beautifulsoup

asked Nov 10 '16 at 14:21

Kim Hyesung

Prev 1 2

…

99 100 Next