I have the below HTML content:
<html>
<body>
<div>
<p><img class="img.jpg" /></p>
</div>
</body>
</html>
and i am trying to parse the HTML using lxml
parser as below:
import lxml.html as LH
root = LH.fromstring(html)
for el in root.iter('img'):
el.attrib['src'] = el.attrib['class']
content = '<html><body>' + LH.tostring(root) + '</body></html>'
I am getting the content after parsing as below:
<html>
<body>
<div>
<p><img class="img.jpg" src="img.jpg"></p>
</div>
</body>
</html>
As you can see, the <img>
's closing tag </>
has been removed after parsing. Is there anyway I can retain all the HTML closing tags after HTML parsing?