4

I am getting this error while parsing a web site . ERROR: 'The declaration for the entity "ContentType" must end with '>'.' or input type must b closed

Ashu
  • 392
  • 1
  • 7
  • 16

1 Answers1

2

Have you considered JTidy ?

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.

Obviously at some point it will struggle with the HTML depending on how badly-formed it is, but you may find this works for you.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440