0

I'm working on using ndiff to check the diffs between two text files and also calculate how many diffs were found. At somepoint, I've found that I was receiving two different values depending on where the line of code was written...

Can anyone bring some light on what I'm doing wrong here?

I'm sure it must be a very silly thing. Thanks!

This is the code and output...

import difflib

text1 = open('file1.txt', encoding="utf8").readlines()
text2 = open('file2.txt', encoding="utf8").readlines()
print("Showing Data")
print("text1 => " + str(text1))
print("text2 => " + str(text2))
print("DONE!")
print("***********************************************************************")
print("Messing with ndiff")
diff_count = difflib.ndiff(text1, text2)
print("What is in diff_count? " + str(list(diff_count)))
print("Size of List =>>>" + str(len(list(diff_count))))
print("DONE!")
print("***********************************************************************")
print("Messing with ndiff II")
diff_count = difflib.ndiff(text1, text2)
print("Size of List =>>>" + str(len(list(diff_count))))
print("What is in diff_count? " + str(list(diff_count)))
print("DONE!")
print("***********************************************************************")

And the output...

Showing Data
text1 => ['opentechguides website contains\n', 'tutorials and howto articles\n', 'on topics such as Linux\n', 'Windows, databases etc.,']
text2 => ['opentechguides website contains\n', 'tutorials and howto articles\n', '\n', 'on topics such as Linux\n', 'Windows, databases , networking\n', 'programming and web development.']
DONE!
***********************************************************************
Messing with ndiff
What is in diff_count? ['  opentechguides website contains\n', '  tutorials and howto articles\n', '+ \n', '  on topics such as Linux\n', '- Windows, databases etc.,', '?                      ^^^\n', '+ Windows, databases , networking\n', '?                    +++  ^^^^^^^^\n', '+ programming and web development.']
Size of List =>>>0
DONE!
***********************************************************************
Messing with ndiff II
Size of List =>>>9
What is in diff_count? []
DONE!
***********************************************************************
khelwood
  • 55,782
  • 14
  • 81
  • 108
Mr.Z.68
  • 3
  • 1

2 Answers2

1

Some objects in Python can only be iterated over once. If you try to iterate over them a second time, then they give zero elements. Example:

>>> x = iter([1,2,3,4])
>>> list(x)
[1, 2, 3, 4]
>>> list(x)
[]

I suspect diff_count is one such object. If you call list on it twice, the first time it returns a list with 9 elements, and the second time it returns an empty list. This explains the discrepancy in your two code sections. The first code section shows the 9 elements of the list and displays a length of zero, because the object is exhausted at the time of the len call. The second code section shows a length of nine and displays 0 elements of the list, because the object is exhausted at the time of the str(list(diff_count)) call.

If you want to iterate over diff_count multiple times, then exactly once convert it into a type that can be multiply iterated, and iterate over that instead.

diff_count = difflib.ndiff(text1, text2)
seq = list(diff_count)
print("What is in diff_count? " + str(seq))
print("Size of List =>>>" + str(len(seq)))
Kevin
  • 74,910
  • 12
  • 133
  • 166
0

ndiff returns a generator, not a list:

return a Differ-style delta (a generator generating the delta lines)

Hence, the first time you iterate through it, you get a reasonable value, and the second time you get nothing. The solution is to make it a list when you first get it, then use that list multiple times:

diff_count = list(difflib.ndiff(text1, text2))
scnerd
  • 5,836
  • 2
  • 21
  • 36