Here is the problem that I am trying to solve.
- I have two folders which contain XML files.
- One folder - lets say "source" folder - contains around 350,000 XML files.
- Another folder - lets say "compare" folder - contains the same 350,000 XML files and a few more.
- The 350,000 files that are present in both have the same names. Exact same.
- However, the files in "source" are slightly different from the files in "compare". The files in compare may (or may not) have some extra nodes.
- I need to compare the "identically named files" from "source" and "compare". If - for each file in "source" - all the nodes that are present in file of "source" are present in the file of "compare" - I need to produce a Ok report.
- If not, i.e.
- there is some file in "source" that is not present in "compare"
- in any file of "source" there is some node that is not present in the corresponding file of "compare"
- Then I need to create a error report with the details of what is missing.
I am currently pursuing Java + XMLUnit for this problem and am not sure if that can solve it. Even if it is, I am definitely not sure if this is the most optimal choice of tool.
Any help / suggestion will be much appreciated.