1

I'm getting the error TypeError: insertDoctype() takes exactly 4 arguments (2 given) when using lxml and html5lib together. It seems that the insertDoctype method in lxml.html._html5builder.TreeBuilder (link) takes 4 args, while the html5lib code (link) calls it with 2 args. Am I somehow using this wrong?

These are the versions I'm using:

$ pip freeze
BeautifulSoup==3.2.0
distribute==0.6.14
html5lib==0.90
lxml==2.3
mechanize==0.2.4
wsgiref==0.1.2

My source code:

from lxml.html import html5parser

html5parser.document_fromstring('''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>t</title><body></body></html>''')

And the error:

Traceback (most recent call last):
  File "/tmp/t.py", line 4, in <module>
    <html><head><title>t</title><body></body></html>''')
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/lxml/html/html5parser.py", line 54, in document_fromstring
    return parser.parse(html, useChardet=guess_charset).getroot()
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 211, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 111, in _parse
    self.mainLoop()
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 189, in mainLoop
    self.phase.processDoctype(token)
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 482, in processDoctype
    self.tree.insertDoctype(token)
TypeError: insertDoctype() takes exactly 4 arguments (2 given)
lmz
  • 1,560
  • 1
  • 9
  • 19
  • I duplicated the error while checking it wasn't just because you aren't using the html5 doctype. `html5parser.document_fromstring(' t')` gives the same error. – Prydie Apr 03 '11 at 15:43
  • 1
    Yes it looks like the tree builder in lxml and html5lib 0.90 are not compatible. Fixing insertDoctype to handle the single parameter only leads to errors elsewhere. It looks like html5lib is now passing dictionaries instead of separate strings to the tree builder. – lmz Apr 03 '11 at 17:17

0 Answers0