3

I'm using lxml to parse an html that has a facebook comments tag that looks like that:

<fb:comments id="fb_comments"  href="http://example.com" num_posts="5" width="600"></fb:comments>

I am trying to select it to get the href value but when i do a cssselect('fb:comments') i get the following error:

The pseudo-class Symbol(u'comments', 3) is unknown

Is there a way to do it?

Edit: The code:

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb:comments')  #raises the exception 
applechief
  • 6,615
  • 12
  • 50
  • 70

1 Answers1

3

The cssselect() method parses the document using given CSS selector expression. In your case the colon character (:) is a XML namespace prefix separator (i.e. <namespace:tagname/>) which is confused with CSS pseudo-class syntax (i.e. tagname:pseudo-class).

According to lxml manual you should use namespace-prefix|element syntax in cssselect()in order to to find a tag (comments) with a namespace prefix (fb). So:

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb|comments')
Mariusz Jamro
  • 30,615
  • 24
  • 120
  • 162