I have the following code that removes duplicates paragraphs from html file.
from bs4 import BeautifulSoup
fp = open("Input.html", "rb")
soup = BeautifulSoup(fp, "html5lib")
elms = []
for elem in soup.find_all('font'):
if elem not in elms:
elms.append(elem)
else:
target =elem.findParent().findParent()
target.decompose()
print(soup.html)
Is almost working, but for some elements I get this error
attributeerror: 'nonetype' object has no attribute 'findparent'
Is there a way to print the line number within the HTML file where the error happens to check what is the format?
the structure of elements for which the code doesn't have issues is like this
<!DOCTYPE html>
<html>
<body>
<p align="left">
<b><font face="Times New Roman" size="5" color="red">Some text</font></b>
</p>
</body>
</html>
But since the file is a kind of large, I don't have identified the structure of the elements where the code stucks.