I am using lxml to parse an html file:
from lxml import html
tree = html.parse(myfile)
data = tree.xpath('//p/text()')
I have 300 <p>text</p>
tags in my html file, but len(data)
is only 250 because sometimes I'll have <p></p>
in my html. I want these to be included in data
either as a 'nan'
or ''
.
Any suggestions on how to do this?