I use the non-validating read for displaying or processing un-trusted XML documents where I do not need support for internal entities but I do want to be able to process then even if a DOCTYPE is shown.
With the disallow DOCTYPE-decl feature of SAX I can make sure parsing a XML document has no risk of external entities or billion laughter DOS expansions. This is also recommended by the OWASP XXE prevention cheat-sheet.
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setFeature("http://apache.org/xml/features/continue-after-fatal-error", true);
reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// or
reader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
reader.setFeature("http://xml.org/sax/features/external-general-entities", false);
reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
However unfortunately this aborts the parsing when a DOCTYPE is given:
org.xml.sax.SAXParseException; systemId: file:... ; lineNumber: 2; columnNumber: 10;
DOCTYPE is disallowed when the
feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
And if I ignore this fatal error, then it will happily resolve internal entities, as you can see here: https://gist.github.com/ecki/f84d53a58c48b13425a270439d4ed84a
I wonder, is there a combination of features so I can read over but not evaluate the doctype declaration (especially avoiding recursive expansion).
I am looking to avoid defining my own Apache specific security-manager property or a special resolver.