0

First example:

one = ['billy', 'sally', 'gd', 'kk', 'btb']
two = ['billy', 'sally', 'hh', 'kk', 'ff', 'btb']
opcodes1 = SequenceMatcher(None, one, two).get_opcodes()
opcodes2 = SequenceMatcher(None, two, one).get_opcodes()

correctly returns the insert ff:

[('equal', 0, 5, 0, 5), ('replace', 5, 6, 5, 6), ('equal', 6, 9, 6, 9), ('insert', 9, 9, 9, 11), ('equal', 9, 10, 11, 12)]
[('equal', 0, 5, 0, 5), ('replace', 5, 6, 5, 6), ('equal', 6, 9, 6, 9), ('delete', 9, 11, 9, 9), ('equal', 11, 12, 9, 10)]

Now, I would like get_opcodes() to find a 'insert' which is next to a 'replace' ... but it is unable.

Second example:

one = ['billy', 'sally', 'gd', 'kk', 'btb']
two = ['billy', 'sally', 'hh', 'kk1', 'ff', 'btb']
opcodes1 = SequenceMatcher(None, one, two).get_opcodes()
opcodes2 = SequenceMatcher(None, two, one).get_opcodes()

returns:

[('equal', 0, 2, 0, 2), ('replace', 2, 4, 2, 5), ('equal', 4, 5, 5, 6)]
[('equal', 0, 2, 0, 2), ('replace', 2, 5, 2, 4), ('equal', 5, 6, 4, 5)]

In this next example we force the difference to be recognized. I've added padding ... which amazingly is ignored ... this is so amazing because the 'kk' in the first example is acting as padding, stopping the 'gd' vs 'hh' from being considered part of the 'ff' insert

Third example:

one = ['///////', 'billy', '///////', 'sally', '///////', 'gd', '///////', 'kk', '///////', 'btb']
two = ['///////', 'billy', '///////', 'sally', '///////', 'hh', '///////', 'kk1', '///////', 'ff', '///////', 'btb']
opcodes1 = SequenceMatcher(None, one, two).get_opcodes()
opcodes2 = SequenceMatcher(None, two, one).get_opcodes()

returns:

[('equal', 0, 5, 0, 5), ('replace', 5, 6, 5, 6), ('equal', 6, 7, 6, 7), ('replace', 7, 8, 7, 10), ('equal', 8, 10, 10, 12)]
[('equal', 0, 5, 0, 5), ('replace', 5, 6, 5, 6), ('equal', 6, 7, 6, 7), ('replace', 7, 10, 7, 8), ('equal', 10, 12, 8, 10)]

Once again, failing to recognize the insert value ff when it is clearly there.

Rhys
  • 4,926
  • 14
  • 41
  • 64
  • What do you mean by find an insert that is next to a replace? – Dani Mesejo Oct 27 '19 at 23:30
  • `get_opcodes()` is able to identify if text is 'equal', 'replace', 'insert' or 'delete' ... when comparing two texts – Rhys Oct 27 '19 at 23:32
  • Yes I am aware, so do you mean that the when you reverse the parameters, the corresponding one is a replace? It seems to me that the output in the first example is correct – Dani Mesejo Oct 27 '19 at 23:34
  • no, the `insert` is not recognized at all. It is "merged" into the `replace` .... but it **is** recognized when a divider is used ... such as in example 1 where `'kk'` is dividing `'hh'` from `'ff'` ..... but if i add my own dividers ... it says ... nope! – Rhys Oct 27 '19 at 23:38

0 Answers0