I am new python. I have some predefined xml files. I have a script which generate new xml files. I want to write an automated script which compares xmls files and stores the name of differing xml file names in output file? Thanks in advance
-
A little more context will be useful. Are you starting with a list of filenames, or a directory where the files live? Do you want to compare if the files are textually identical (easy), or just if the xml data is the same (trickier)? Do you want to compare every file against every other file, and generate a list of all pairs of files that are not the same? Or are you comparing them all to one file? Or do you just want a function that compares any two xml files? – Brionius Aug 08 '13 at 05:06
-
I have a pair of XML files, I predefined and 1 generated by another script. I want to write a script or a function which will take path of source and destination XMLs, compare them. If both file contents are same, move to next file pair else if XML content is not same store the destination XML path in output file – Manu Aug 08 '13 at 05:14
4 Answers
I think you're looking for the filecmp
module. You can use it like this:
import filecmp
cmp = filecmp.cmp('f1.xml', 'f2.xml')
# Files are equal
if cmp:
continue
else:
out_file.write('f1.xml')
Replace f1.xml
and f2.xml
with your xml files.
Do you speak of comparing them byte-wise or for semantic equality?
(Is <tag attr1="1" attr2="2" />
equal to <tag attr2="2" attr1="1" />
?)
If you want to check for semantic equality have a look at Xml comparison in Python
When generating xml especially if using normal dicts for the attributes somewhere attribute order can be mixed up even sometimes when you use the same script with the same input.
items()
...
CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

- 2,626
- 1
- 17
- 15
Building on @Xaranke's answer:
import filecmp
out_file = open("diff_xml_names.txt")
# Not sure what format your filenames will come in, but here's one possibility.
filePairs = [('f1a.xml', 'f1b.xml'), ('f2a.xml', 'f2b.xml'), ('f3a.xml', 'f3b.xml')]
for f1, f2 in filePairs:
if not filecmp.cmp(f1, f2):
# Files are not equal
out_file.write(f1+'\n')
out_file.close()

- 13,858
- 3
- 38
- 49
What about the following snippet :
def separator(self):
return "!@#$%^&*" # Very ugly separator
def _traverseXML(self, xmlElem, tags, xpaths):
tags.append(xmlElem.tag)
for e in xmlElem:
self._traverseXML(e, tags, xpaths)
text = ''
if (xmlElem.text):
text = xmlElem.text.strip()
xpaths.add("/".join(tags) + self.separator() + text)
tags.pop()
def _xmlToSet(self, xml):
xpaths = set() # output
tags = list()
root = ET.fromstring(xml)
self._traverseXML(root, tags, xpaths)
return xpaths
def _areXMLsAlike(self, xml1, xml2):
xpaths1 = self._xmlToSet(xml1)
xpaths2 = self._xmlToSet(xml2)
return xpaths1 == xpaths2