0

I'm trying to do a text comparison via the 'difflib' library.

I was wondering how to JUST get the terms which are specific to the first string sequence vs the second.

Ex:

import difflib

one = "If rents are received later than five (5)"
two = "If rents are received later than eight (8)"

n_one = one.replace(" ","\n")
n_two = two.replace(" ","\n")

diff = difflib.ndiff(n_one.splitlines(1),n_two.splitlines(1))

print ''.join(diff)"
# ...
# - five
# - (5) + eight
# + 8  

I was wondering how to get two strings:

-> Difference in first string:

['five','(5)']

--> Difference in second string:

['eight','(8)']
okeoke
  • 83
  • 1
  • 7

2 Answers2

1
    import difflib

    one = "If rents are received later than five (5)"
    two = "If rents are received later than eight (8)"

    n_one = one.replace(" ","\n")
    n_two = two.replace(" ","\n")

    diff = difflib.ndiff(n_one.splitlines(0),n_two.splitlines(0))

    one_lst = []
    two_lst = []

    for change in diff:
        if change[0] == "-":
            one_lst.append(change[2:])
        elif change[0] == "+":
            two_lst.append(change[2:])

    >>>> one_lst
    ['five', '(5)']
    >>>> two_lst
    ['eight', '(8)']
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • Thank so much for the reply! It works perfectly. I was wondering: what does the change[2:] code does? – okeoke Jun 19 '19 at 10:20
  • @swamz `change` is a string variable of the format "[ +-] " so `change[0]` checks for `+` or `-` and `change[2:]` slices the string from the actual change until the end. have a look [here](https://www.pythoncentral.io/cutting-and-slicing-strings-in-python/) – Tomerikoo Jun 19 '19 at 11:24
0

As a one-liner, not using difflib:

>>> first, second = zip(*[(a, b) for a, b in zip(one.split(" "), two.split(" ")) if a != b])
>>> first
('five', '(5)')
>>> second
('eight', '(8)')

This, of course, works because we're dealing here with single string inputs, and the strings differ at exactly the same spot. If the second string ended with "eight(8)" instead, this would miss the '(5)' in the diff.

9769953
  • 10,344
  • 3
  • 26
  • 37