0

I have the below HTML content:

<html>

<body>
    <div>
        <p><img class="img.jpg" /></p>
    </div>
</body>

</html>

and i am trying to parse the HTML using lxml parser as below:

import lxml.html as LH
root = LH.fromstring(html)
for el in root.iter('img'):
    el.attrib['src'] = el.attrib['class']
content = '<html><body>' + LH.tostring(root) + '</body></html>'

I am getting the content after parsing as below:

<html>

<body>
    <div>
        <p><img class="img.jpg" src="img.jpg"></p>
    </div>
</body>

</html>

As you can see, the <img>'s closing tag </> has been removed after parsing. Is there anyway I can retain all the HTML closing tags after HTML parsing?

Nishant
  • 20,354
  • 18
  • 69
  • 101
venu gopal
  • 29
  • 2
  • Is there any way i can achieve using html parser of lxml or I have to use xml parser of lxml? – venu gopal Feb 07 '20 at 07:33
  • 1
    Does this answer your question? [Why is the tag not closed in HTML?](https://stackoverflow.com/questions/23890716/why-is-the-img-tag-not-closed-in-html) – Nishant Feb 07 '20 at 08:09
  • I think it is not needed in HTML - please see the linked question. Will it work if you parse as XML? – Nishant Feb 07 '20 at 08:11

0 Answers0