0

I am trying to compare below given two xml formats in python and would like to have your inputs on my approach

File 1:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="111" type="tires" unit="percent">511</p1:feature>
    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="448" type="tires" unit="percent">520</p1:feature>
</p1:car>

File 2:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="440" type="tires" unit="percent">520</p1:feature>
</p1:car>

As you can look closely that in File 2 there is no line <p1:feature car8="111" type="tires" unit="percent">511</p1:feature> in 2nd paragraph which is present in File 1.

Also in last line of 2nd paragraph of file 2 its car="440"whereas in file 1 it is car="448"

What I want:

In files I am dealing there are numerous such differences so can you guys tell me how to printout such missing lines and unequal numbers from these files.I want output in following form:

In file two feature car="111", type="tires" and text = 511 is missing
In file two car="448" whereas in file one it is car="440"

Also, you can suggest me ideas and different methods. I am stuck in this question from very long time and want to get this solve immediately.

What I tried:

I am using lxml for comparison work and I tried using a for loop in following manner:

for i,j in zip(file1.getchildren(),file2.getchildren()):
        if (int(i.get("car")) & int(i.text)) != (int(j.get("car")) & int(j.text)):
               print "difference of both files"

Due to line to line approach of comparison I am getting all wrong results starting from 2nd paragraph of both files since one line is missing from 2nd file.

Radheya
  • 779
  • 1
  • 11
  • 41

1 Answers1

2

I think what you want is difflib. Please take a the official documentation here.

In general words, what you want is:

from difflib import Differ
text_1 = file_1.read() # getting XML contents
text_2 = file_2.read() # getting XML contents from second file
d = Differ()
result = d.compare(text_1, text_2)

For more details about usage please refer to the official documentation.

Fabio Menegazzo
  • 1,191
  • 8
  • 9
  • Thank you for you input. can we use difflib to also compare the parts of xml files? if not entire document. Because the structure of my xml files is very complex – Radheya Jul 08 '15 at 11:54
  • 1
    @DhruvJ if you want to go deep into this XML comparison matter I would suggest you to make use of third-party libraries for this purpose, such as ``xmldiff`` (available through pip) or ``formencode`` (available at BitBucket [here](http://bitbucket.org/ianb/formencode/src/tip/formencode/doctest_xml_compare.py#cl-70)). Relate to the second library, there is a similar discussion [here](http://stackoverflow.com/questions/3007330/xml-comparison-in-python). – Fabio Menegazzo Jul 08 '15 at 13:07
  • -Menegazzo I would look into that. Thank you for your suggestions. – Radheya Jul 08 '15 at 13:24