Questions tagged [html5lib]

html5lib is a library for parsing and serializing HTML documents and fragments in Python, with ports to Dart, PHP, and Ruby.

html5lib is an open-source HTML parser for Python, based on the HTML specification. There are ports for PHP and Ruby (both unmaintained), as well as a third-party one for Dart.

107 questions
5
votes
1 answer

How to fix "unexpected keyword argument 'useChardet'" in html5lib

I'm using html5lib and after updating it to the latest version, I keep getting this error: Traceback (most recent call last): File "/home/travis/build/freelawproject/juriscraper/tests/test_everything.py", line 119, in…
mlissner
  • 17,359
  • 18
  • 106
  • 169
4
votes
1 answer

html5lib: TypeError: __init__() got an unexpected keyword argument 'encoding'

I'm trying to install html5lib. at first I tried to install the latest version (8 or 9 nines), but it came into conflict with my BeautifulSoup, so I decided to try older verison (0.9999999, seven nines ). I installed it, but when I try to use…
parsecer
  • 4,758
  • 13
  • 71
  • 140
4
votes
1 answer

python3 - No module named 'html5lib'

I'm running a python3 program that requires html5lib but I receive the error No module named 'html5lib'. Here are two session of terminal: sam@pc ~ $ python Python 2.7.9 (default, Mar 1 2015, 12:57:24) [GCC 4.9.2] on linux2 >>> import html5lib >>>…
4
votes
1 answer

Why pip search can't find certain packages while they can be installed via pip install anyway

Why pip search can't find certain packages (for example, html5lib) while they can be installed via pip install anyway? E:\software\Python276\Scripts>pip search html5lib html5lib-truncation - Truncating HTML with html5lib…
FrozenHeart
  • 19,844
  • 33
  • 126
  • 242
4
votes
1 answer

lxml html5parser ignores "namespaceHTMLElements=False" option

The lxml html5parser seems to ignore any namespaceHTMLElements=False option I pass to it. It puts all elements I give it into the HTML namespace instead of the (expected) void namespace. Here’s a simple case that reproduces the problem: echo "

" |…

sideshowbarker
  • 81,827
  • 26
  • 193
  • 197
4
votes
2 answers

Xpath with html5lib in PHP

I have this basic code that doesn't work. How can I use Xpath with html5lib php? Or Xpath with HTML5 in any other way. $url = 'http://en.wikipedia.org/wiki/PHP'; $response = GuzzleHttp\get($url); $html5 = new Masterminds\HTML5(); $dom =…
Markus Hedlund
  • 23,374
  • 22
  • 80
  • 109
3
votes
1 answer

Django CMS "No module named html5lib"

I have a basic Django CMS site installed with all the default and recommended modules however I recieve and error saying... Updated Request Method: GET Request URL: http://teamdjango.lnukapps.co.uk/admin/cms/page/21/ Django Version:…
rockingskier
  • 9,066
  • 3
  • 40
  • 49
3
votes
2 answers

Error 'html5lib not found' when using pandas.read_html() function in conda env

Current code: import requests import pandas as pd url = 'https://docs.anaconda.com/anaconda/user-guide/getting-started/' html = requests.get(url, verify=False).content df_list = pd.read_html(html, flavor='bs4') df = df_list[0] I'm tying to…
GeosGeek
  • 31
  • 1
  • 3
3
votes
2 answers

Html5 find/parse specific element in page python

I'm trying to learn how to find/parse data from html5 webpages to use in a database. I want to learn how to find/parse the data from only the first of this '//div[@class="col-xs-12 col-sm-6 col-md-4 col-lg-3"]' I've tried html5lib, from lxml import…
Marie Anne
  • 301
  • 1
  • 2
  • 12
3
votes
3 answers

How can I add consistent whitespace to existing HTML using Python?

I just started working on a website that is full of pages with all their HTML on a single line, which is a real pain to read and work with. I'm looking for a tool (preferably a Python library) that will take HTML input and return the same HTML…
mjjohnson
  • 1,360
  • 3
  • 15
  • 25
3
votes
1 answer

Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

so I'm trying to scrape an Amazon page for data, and I'm getting an error when I try to parse for where the seller is located. Here's my code: #getting the html request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/') opener =…
3
votes
1 answer

Can't open html5lib in Python

I just installed html5lib for Python with Windows Command Prompt. The package was installed here: File "C:\Python27\lib\site-packages\html5lib However, if I try to import html5lib: #! /usr/bin/python import html5lib I get the following…
LaGuille
  • 1,658
  • 5
  • 20
  • 37
3
votes
2 answers

Python BeautifulSoup Error

I have this script: import urllib2 from BeautifulSoup import BeautifulSoup import html5lib import lxml soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read()) But this gives me the following error: Traceback (most recent call…
torayeff
  • 9,296
  • 19
  • 69
  • 103
2
votes
1 answer

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib

I got this error when running my python code: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library? So I searched online and read this I checked my packages installed,…
raffa
  • 145
  • 2
  • 11
2
votes
0 answers

Python dependency issues with Django on Docker

Im new to Docker and im having troubles porting my already existing and working Django project to Docker and im pretty much stuck right now since the issue is with the dependencies in my requirements.txt that are frozen and actually are working on…
J4ckN1x
  • 73
  • 1
  • 5