difflib output is very strange, adding extra whitespace on each character

Question

I'm playing around with difflib in Python and I'm having some difficulty getting the output to look good. For some strange reason, difflib is adding a single whitespace before each character. For example, I have a file (textfile01.txt) that looks like this:

test text which has no meaning

and textfile02.txt

test text which has no meaning

but looks nice

Here's a small code sample for how I'm trying to accomplish the comparison:

import difflib

handle01 = open(text01.txt , 'r')
handle02 = open(text02.txt , 'r')

d = difflib.ndiff( handle01.read() , handle02.read() )
print "".join(list(diff))

Then, I get this ugly output that looks...very strange:

t e s t t e x t w h i c h h a s n o m e a n i n g-

- b- u- t- - l- o- o- k- s- - n- i- c- e

As you can see, the output looks horrible. I've just been following basic difflib tutorials I found online, and according to those, the output should look completely different. I have no clue what I'm doing wrong. Any ideas?

score 8 · Accepted Answer · answered Jan 16 '15 at 23:38

difflib.ndiff compares lists of strings, but you are passing strings to them — and a string is really a list of characters. The function is thus comparing the strings character by character.

>>> list(difflib.ndiff("test", "testa"))
['  t', '  e', '  s', '  t', '+ a']

(Literally, you can go from the list ["t", "e", "s", "t"] to the list ["t", "e", "s", "t", "a"] by adding the element ["a"] there.

You want to change read() to readlines() so you can compare the two files in a linewise fashion, which is probably what you were expecting.

You also want to change "".join(... to "\n".join(... in order to get a diff-like output on screen.

>>> list(difflib.ndiff(["test"], ["testa"]))
['- test', '+ testa', '?     +\n']
>>> print "\n".join(_)
- test
+ testa
?     +

(Here difflib is being extra nice and marking the exact position where the character was added in the ? line.)

That fixed it. I didn't realize it was looking for a list of strings. Most of the examples I was looking at "appeared" to be using normal strings. Thanks for your assistance! — erichar7, Jan 19 '15 at 22:31
indeed, the officiel example on the official handbook use string...https://docs.python.org/3/library/difflib.html#difflib.Differ — Maïeul, May 14 '21 at 09:39

difflib output is very strange, adding extra whitespace on each character

1 Answers1