So I've got a large amount of XML files. For years they've caused trouble because the people that write them do them by hand, so errors naturally occurred. It's high time we get around to validating them and providing feedback on what's wrong when trying to use these XML files.
I'm using the SAX parser and getting a list of errors.
Below is my code
BookValidationErrorHandler errorHandler = new BookValidationErrorHandler();
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("test.xsd")}));
javax.xml.parsers.SAXParser parser = factory.newSAXParser();
org.xml.sax.XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(errorHandler);
reader.parse(new InputSource("bad.xml"));
The first couple errors are always:
Line Number: 2: Document is invalid: no grammar found. Line Number: 2: Document root element "credits", must match DOCTYPE root "null".
We can't possibly go and edit these thousands of XML files that needed to be checked.
Is there anything I can easily add to the front of the source to prevent this? Is there a way to tell the parser to ignore these DTD related errors? Not even sure what the grammar one means. I sort of understand what the second one means.