I'm messing around with lxml
in Python, but can't seem to figure out how to use the cssselect()
function to get all div
's with the class reddit-entry
, as it seems to dislike the -
character. Any other class name without -
works fine.
Asked
Active
Viewed 1,162 times
0

Martijn Pieters
- 1,048,767
- 296
- 4,058
- 3,343

RobinJ
- 5,022
- 7
- 32
- 61
-
Umm... not sure about that - does `obj.xpath('//div[@class="reddit-entry"]')` work? – Jon Clements Jun 23 '12 at 13:49
2 Answers
1
That’s a bug in the parser in lxml.cssselect. I took over maintenance of the project and extracted it from lxml. The bug is fixed in the new cssselect: http://packages.python.org/cssselect/
lxml 2.4 will use the new cssselect, but until then the way to use it is:
from cssselect import HTMLTranslator
result = lxml_document.xpath(HTMLTranslator().css_to_xpath('div.reddit-entry'))

Simon Sapin
- 9,790
- 3
- 35
- 44
0
If you run the code that cssselector uses via xpath it does work...
obj.xpath("//div[contains(concat(' ', normalize-space(@class), ' '), ' reddit-entry ')]")

Jon Clements
- 138,671
- 33
- 247
- 280