I am getting this error while parsing a web site . ERROR: 'The declaration for the entity "ContentType" must end with '>'.' or input type must b closed
Asked
Active
Viewed 479 times
1 Answers
2
Have you considered JTidy ?
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.
Obviously at some point it will struggle with the HTML depending on how badly-formed it is, but you may find this works for you.

Brian Agnew
- 268,207
- 37
- 334
- 440