0

When I add a line to the middle of a file, all following lines have their number incremented.

Is there a utility that generates the list of equivalent line numbers between two files?

The output would be something like:

1 1
2 2
3 4 (line added)
4 5

One can probably create such utility by using dynamic programming in a way similar to the diff algorithm. Seems useful, hasn't already been done?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Penz
  • 5,428
  • 5
  • 31
  • 28
  • 1
    Be a little more concrete. Do you need a library to be embedded into another framework (which technology?) or just a tool to view it (Some diff tool like KDiff?)? A little naive: Merge the two files and output all lines which occur more than once. – Adrian Dec 20 '12 at 13:24
  • A single unix-like utility that output line number equivalence would be fine. In the end, I want to check if errors found by static tools in two differente file versions are the same or not, but I keep bumping in line number difference when lines are inserted or removed. – Penz Dec 20 '12 at 15:22
  • A combination of "cat" the two files into one and "uniq" the duplicate lines out might do the job. I don't checked it, but it could work like this `cat file1 > result; cat file2 >> result; uniq -d result` – Adrian Dec 20 '12 at 19:06

1 Answers1

0

I found out it is pretty easy to do with python's difflib:

import difflib

def seq_equivs(s1, s2):
    equiv = []
    s = difflib.SequenceMatcher(a=s1, b=s2)
    for m in s.get_matching_blocks():
        if m[2] == 0:
            break
        for n in range(1, 1+m[2]):
            equiv.append((m[0]+n, m[1]+n))
    return equiv

Example usage:

f1 = open('file1.txt').read().split('\n')
f2 = open('file2.txt').read().split('\n')

for equivs in seq_equivs(f1, f2):
    print('%d %d' % equivs)
Penz
  • 5,428
  • 5
  • 31
  • 28