Trying to read a html content and extract the last table's content to an array using lxml
.
Here is my last table:
<table border="1">
<thead>
<tr>
<td><p>T1</p></td>
<td><p>T2</p></td>
<td><p>T3</p></td>
</tr>
</thead>
<tbody>
<tr>
<td><p>A1</p></td>
<td><p></p></td>
<td><p>A3</p></td>
</tr>
</tbody>
</table>
When i run the below code, eol_table
value is ['T1', 'T2', 'T3', 'A1', 'A3']
. Its not showing the None
or blank value when <p>
content is blank.
Expected value is ['T1', 'T2', 'T3', 'A1', '', 'A3']
. How can i get the result like this ?
Code:
eol_html_content = urlfetch.fetch("https://dl.dropboxusercontent.com/u/7384181/Test.html").content
import lxml.html as LH
html_root = LH.fromstring(eol_html_content)
eol_table = None
for tbl in html_root.xpath('//table'):
eol_table = tbl.xpath('.//tr/td/p/text()')
self.response.out.write(eol_table)
` tag, how do i add `None` in list for that column?
` elements you shall change from `p_elements = tbl.xpath(".//tr/td/p")` to `td_elements = tbl.xpath(".//tr/td")`. Then loop over found `td` elements: if there is no `p` element in it, you return `None`, if there is `p`, return the `text()` of it. As this makes the looping over `td` a bit longer, I would not use list comprehension and use usual `for` loop on found `
` yourself (of ask another question).