2

How to get p tag text "Blahblah" in this situation :

when p tag text field is behind a strong tag, it can not be recognized by lxml.

<p class="user_p"><strong>cc</strong>Blahblah</p>

====code====

from lxml import html
content="""
    <div>
    <p class="user_p">Blahblah<strong>cc</strong></p>
    <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>
"""
tree = html.fromstring(content.decode('utf-8'))

p = tree.xpath('//div/p')

print p[0].text

print p[1].text

====output====

Blahblah
None
babayetu
  • 77
  • 1
  • 5

1 Answers1

1

In this HTML fragment,

<p class="user_p"><strong>cc</strong>Blahblah</p>

the text "Blahblah" is the value of the tail property of the <strong> element.

Demo code:

from lxml import html

content = """
    <div>
     <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>"""

tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail

Output:

Blahblah
mzjn
  • 48,958
  • 13
  • 128
  • 248
  • You are correct. I also find another way: "//div/p/strong/following-sibling::text()". It could also fetch it out. Added for reference – babayetu Mar 27 '15 at 06:36