1

I try to extract values from this:

<tr>
     <td>a</td>
     <td>a</td>
     <td>b</td>
     <td></td>
     <td>b</td> 
</tr>

//td/text() returns:

a
a
b
b

How can I get the following output?

a
a
b

b
kevin
  • 1,914
  • 4
  • 25
  • 30

1 Answers1

1

If you are using lxml.html - loop over the td elements found and get the text_content():

from lxml.html import fromstring

data = """
<tr>
     <td>a</td>
     <td>a</td>
     <td>b</td>
     <td></td>
     <td>b</td>
</tr>"""

tree = fromstring(data)

for td in tree.xpath(".//td"):
    print(td.text_content())

Prints:

a
a
b

b
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195