Get value using lxml

Question

I have the following html:

<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>

I want to get the value "2.35 : 1" from the content. However, when I try using lxml, it returns an empty string (I am able to get the 'Aspect Ratio' value, probably because that is neatly between tags.)

item.find('div').text

How would I then get the "2.35 : 1" value? Using etree.tostring does get me the full output.

alecxe · Accepted Answer · 2015-02-10T01:38:40.693

This is called the .tail of an element:

from lxml.html import fromstring

data = """
<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>
"""

root = fromstring(data)
print root.xpath('//h4[@class="inline"]')[0].tail

Prints 2.35 : 1.

As an alternative, you can get the following text sibling of the h4 element:

root.xpath('//h4[@class="inline"]/following-sibling::text()')[0]

Also, make sure you are using lxml.html since you are dealing with an HTML data.

score 0 · Answer 2 · answered Feb 10 '15 at 01:14

You can also use .text_content(), instead of .text, which will give you the entire text contents of the element (http://lxml.de/lxmlhtml.html) --

>>> item.find('div').text.text_content()
Aspect Ratio: 2.35 : 1

The full statement would then be:

>>> title_detail.text_content().split('Aspect Ratio: ')[1].strip()
2.35 : 1

Get value using lxml

2 Answers2