0

I am comparing two xml files using win merge. The files are deployment files and im looking for variation between the environments. The main issue is that the xml files are littered with tags that indicate a change in underlying id e.g. 123 but this is unimportant for comparing.

I want to create a regex that i can use in winmerge to exclude elements to compare only the interesting elements. e.g. compare element in the example below

Environment 1

<table>
 <tableInfo>
 <tableId>293</tableId>
 <name>Table Name New</name>
 <repositoryId>0</repositoryId>

Environment 2

<table>
 <tableInfo>
 <tableId>965</tableId>
 <name>Table Name Old</name>
 <repositoryId>0</repositoryId>

Please note that the application producing the xml spits these out in line by line order so it is not a true xml compare

user1605665
  • 3,771
  • 10
  • 36
  • 54

1 Answers1

2

I would not recommend using a regex for this... to do it truly accurately, you would really need to effectively parse the XML, which is really not something for which you want to use a regex.

Win Merge is a line-based diff tool, which really isn't necessarily wholly effective for XML. I would recommend trying an XML-based diff tool, which has more of a concept of XML's tree structure. Most XML-based diff tools appear to be commercial products, but there is diffxml, which is open source, and may be worth a look.

If you can get an XML-based diff of the files, which should inherently be more accurate, since they are not wholly line-based, and take the tree structure into account, you could then further delve into the diffs using an XML parser, such as ElementTree in Python, specifically targeting the tags you consider to be interesting and comparing them to each other to see if they are different.

If diffxml proves to be too unwieldy, it may be worth just doing the parsing using ElementTree or similar (i.e. lxml) and doing the comparison yourself against the two different sources targeted just at the tags in which you are interested.

In short, I think XML parsers, perhaps in combination with a XML-aware diff tool, will be more useful than pure regexes in this case.

khampson
  • 14,700
  • 4
  • 41
  • 43