I try to extract values from this:
<tr>
<td>a</td>
<td>a</td>
<td>b</td>
<td></td>
<td>b</td>
</tr>
//td/text() returns:
a
a
b
b
How can I get the following output?
a
a
b
b
I try to extract values from this:
<tr>
<td>a</td>
<td>a</td>
<td>b</td>
<td></td>
<td>b</td>
</tr>
//td/text() returns:
a
a
b
b
How can I get the following output?
a
a
b
b
If you are using lxml.html
- loop over the td
elements found and get the text_content()
:
from lxml.html import fromstring
data = """
<tr>
<td>a</td>
<td>a</td>
<td>b</td>
<td></td>
<td>b</td>
</tr>"""
tree = fromstring(data)
for td in tree.xpath(".//td"):
print(td.text_content())
Prints:
a
a
b
b