0

I'm doing a Sgml parse with Stax. The Sgml contains characters like "“ ”" and many others that is not replaced setting the UTF-8. The parse breaks and throws the following exception:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[6,22]
Message: The entity "lpar" was referenced, but not declared.

I have one another problem, I have some tags without a close tag, for exemple <coolspan> without a and tag . This break the parse.

I was thinking about create a method to replace all special characters and validate tags without a end tag. Someone already passed by a problem like this, and could show me a way to follow?

Shree Krishna
  • 8,474
  • 6
  • 40
  • 68
  • 1
    I don't think you will succeed to parse a document that does not follow XML rules with an XML parser. – Henry May 19 '16 at 18:03
  • 1
    Also: http://stackoverflow.com/questions/34577021/how-to-ignore-unclosed-tags-in-xml-or-html – kjhughes May 19 '16 at 18:32
  • Those entities have to be declared, and tags have to be closed, or you don't have well-formed XML. – kjhughes May 19 '16 at 18:40

0 Answers0