I want to validate an xml file against it schema. Once the validation is completed I want to remove any invalid data and save this invalid data into a new file. I can perfom the validation, just stuck on the removing and saving invalid data into new file.
Asked
Active
Viewed 1,194 times
2
-
Can you give us some example input and output? What would you be saving to a file if a tag was missing a closing bracket? – Daniel Kaplan Jan 28 '13 at 18:44
-
What sort of parser are you using for validation? – sdasdadas Jan 28 '13 at 18:47
-
I am using SAXparser. Any invalid data is to be removed. Example below. 2nd and 4th topic node are invalid. I want to delete those two nodes and save to new file;
google http://www.google.com 654654 http://www.google.com google http://www.google.com google http -
I haven't used it, but try using this tutorial: http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html Specifically, you'll want to handle SAXExceptions when they occur, and store the results of those exceptions as your invalid data. Then you can look through the file again, remove any invalid data that you found, and store it in a new file. – sdasdadas Jan 28 '13 at 19:13
-
I found the following article that validates the xml. How would I go about removing and saving errors found in a new file? http://www.herongyang.com/XML-Schema/JAXP-XSD-Schema-XML-DOM-Validator-Error-Handler.html – mick Jan 28 '13 at 20:20
-
@mick Every time you catch an exception with that error handler, use System.out.println to figure out what sort of exception you caught. Then you can save those Exceptions (once you figure out what to save). After all the exceptions have been saved, go over the .xml file again and remove all of the Strings that the exceptions threw. – sdasdadas Jan 28 '13 at 21:37
-
any idea on the code itself? – mick Jan 28 '13 at 21:51
-
When you say you want to "remove invalid data", does that mean that you want to be sure that what remains after removing it is a file that's valid against the schema? That's pretty challenging, and it very much depends on the nature of the invalidities. I could envisage detecting say attributes that are invalid against a simple type and substituting an attribute with some default (valid) value. – Michael Kay Jan 28 '13 at 22:14
-
I can pinpoint the line number where the validation error occurs. Is there a way to remove the node by referencing the line number? – mick Jan 28 '13 at 22:37
1 Answers
0
I take back everything I just wrote. ... :) You can get the node you need using the Current Element Node property at Exception time, it seems.
Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
Because the Schema is defined via Xerces, I think this will work. See http://xerces.apache.org/xerces2-j/properties.html#dom.current-element-node .
There is more explanation in the answer at How can I get more information on an invalid DOM element through the Validator? .