0

I'm messing around with lxml in Python, but can't seem to figure out how to use the cssselect() function to get all div's with the class reddit-entry, as it seems to dislike the - character. Any other class name without - works fine.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
RobinJ
  • 5,022
  • 7
  • 32
  • 61

2 Answers2

1

That’s a bug in the parser in lxml.cssselect. I took over maintenance of the project and extracted it from lxml. The bug is fixed in the new cssselect: http://packages.python.org/cssselect/

lxml 2.4 will use the new cssselect, but until then the way to use it is:

from cssselect import HTMLTranslator
result = lxml_document.xpath(HTMLTranslator().css_to_xpath('div.reddit-entry'))
Simon Sapin
  • 9,790
  • 3
  • 35
  • 44
0

If you run the code that cssselector uses via xpath it does work...

obj.xpath("//div[contains(concat(' ', normalize-space(@class), ' '), ' reddit-entry ')]")
Jon Clements
  • 138,671
  • 33
  • 247
  • 280