-1

If I open an html file base_result.htm with pyquery, it returns [None], and throws errors when I search it. If I use that same file as a string, everything works well.

>>> d = PyQuery(filename = 'base_result.html')
>>> d
[None]
>>> f = open('base_result.html')
>>> d = PyQuery(f.read())
>>> d
[<html>] 
maged
  • 859
  • 10
  • 24
  • Do you have a question? This is the documented behavior. – Henry Keiter Aug 06 '13 at 17:52
  • Is this the documented behaviour? I have two identical files, one online and one local, but the parsing for 'url = ', and 'filename = ' is different. – maged Aug 06 '13 at 18:30
  • 1
    I stand corrected; I can't see why it would return `None` (though if the parsing for `url=` and `filename=` were meant to be the same, they wouldn't need two separate keywords!). But yeah, I don't know how you're getting a None return value. Are you sure you have the latest version? – Henry Keiter Aug 06 '13 at 18:39
  • Yeah it's the latest (from https://github.com/gawel/pyquery). Two keywords makes sense, because to load an html file from a url, and from a file path requires different python functions. I guess it could have been parsed though. – maged Aug 06 '13 at 18:43

1 Answers1

1

Its an open issue in PyQuery: https://github.com/gawel/pyquery/issues/22

Some workarounds are mentioned in above link, such as:

>>> from lxml.html import parse
>>> parse("index.html")
<lxml.etree._ElementTree object at 0x108a72f38>
>>> pq(parse("index.html").getroot())

or

>>> f = open('index.html')
>>> d = PyQuery(f.read())
maged
  • 859
  • 10
  • 24