1

I have the following html:

<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>

I want to get the value "2.35 : 1" from the content. However, when I try using lxml, it returns an empty string (I am able to get the 'Aspect Ratio' value, probably because that is neatly between tags.)

item.find('div').text

How would I then get the "2.35 : 1" value? Using etree.tostring does get me the full output.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
David542
  • 104,438
  • 178
  • 489
  • 842

2 Answers2

2

This is called the .tail of an element:

from lxml.html import fromstring

data = """
<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>
"""

root = fromstring(data)
print root.xpath('//h4[@class="inline"]')[0].tail

Prints 2.35 : 1.

As an alternative, you can get the following text sibling of the h4 element:

root.xpath('//h4[@class="inline"]/following-sibling::text()')[0] 

Also, make sure you are using lxml.html since you are dealing with an HTML data.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

You can also use .text_content(), instead of .text, which will give you the entire text contents of the element (http://lxml.de/lxmlhtml.html) --

>>> item.find('div').text.text_content()
Aspect Ratio: 2.35 : 1

The full statement would then be:

>>> title_detail.text_content().split('Aspect Ratio: ')[1].strip()
2.35 : 1
David542
  • 104,438
  • 178
  • 489
  • 842