0

I have two lists which I use the following function to assign line numbers (similar to nl in unix):

def nl(inFile):
    numberedLines = []
    for line in fileinput.input(inFile):
        numberedLines.append(str(fileinput.lineno()) + ':  ' + line)
    numberWidth = int(log10(fileinput.lineno())) + 1
    for i, line in enumerate(numberedLines):
        num, rest = line.split(':',1)
        fnum = str(num).rjust(numberWidth)
        numberedLines[i] = ':'.join([fnum, rest])
    return ''.join(numberedLines)

This retuns lists like: 1: 12 14 2: 20 49 3: 21 28. With the infile I am using, the line numbers are very important. My second list is structured the same way but the line numbers mean nothing. I need to find the list differences from the second file and return the line number from the first. So for example: if the second file has: 5: 12 14 48: 20 49 I want to ONLY return 3 which is the line number of missing values from the first list.

Here is what I've tried:

oldtxt = 'master_list.txt'  # Line numbers are significant
newFile = 'list2compare.txt' # Line numbers don't matter

s = set(nl(oldtxt))
diff = [x for x in (newFile) if x not in s]
print diff

returns: [12 14\n', '20 49\n', '21 28\n'] -- Clearly not what I need. Any ideas?

KennyC
  • 445
  • 1
  • 9
  • 16

3 Answers3

0

How about the following:

f1 = """\
12 14
20 49
21 28
"""

f2 = """\
12 14
20 49
"""

def parse(lines):
  "Take a list of lines, turn into a dict of line number => value pairs"
  return dict((i + 1, v) for i, v in enumerate(l for l in lines if l))

def diff(a, b):
  """
  Given two dicts from parse(), remove go through each linenno => value in a and
  if the value is in b's values, discard it; finally, return the remaining
  lineno => value pairs
  """
  bvals = frozenset(b.values())
  return dict((ak, av) for ak, av in a.items() if av not in bvals)

def fmt(d):
  "Turn linno => value pairs into '  lineno: value' strings"
  nw = len(str(max(d.keys())))
  return ["{0:>{1}}: {2}".format(k, nw, v) for k, v in d.items()]

d1 = parse(f1.splitlines())
print d1
print
d2 = parse(f2.splitlines())
print d2
print
d = diff(d1, d2)
print d
print
print "\n".join(fmt(d))

Which gives me the output:

{1: '12 14', 2: '20 49', 3: '21 28'}

{1: '12 14', 2: '20 49'}

{3: '21 28'}

3: 21 28
spiralx
  • 1,035
  • 7
  • 16
  • thank-you. From your idea I return something that looks like `1: 1: 0 2` Which is displaying all lines in the master file but not showing any differences? So `1:` from masterlist and `1:` from compare list and then the actual digits – KennyC Sep 27 '12 at 18:51
  • I've added comments and the output I get at each stage, does that help. Oh, and there was a bug in the fmt() function I fixed which would have made the formatting screwy. – spiralx Sep 28 '12 at 10:48
0

I'll take a stab at this ;) It sounds like you are after the line numbers of the master file where the contents of that line are also in the compare file. Is this what you are after? In that case I propose...

Master file contents...

1 2 3 4
test
6 7 8 9
compare
me

Compare file contents...

6 7 8 9
10 11 12 13
me

Code:

master_file = open('file path').read()
compare_file = open('file path').read()

lines_master = master_file.splitlines()
lines_compare = compare_file.splitlines()
same_lines = []
for i,line in enumerate(lines_master):
    if line in lines_compare:
        same_lines.append(i+1)

print same_lines

Result is [3,5]

b10hazard
  • 7,399
  • 11
  • 40
  • 53
  • @radio thank-you. I return an empty list when employing your method although their are certainly matches. The problem might be that the 2 files will never have the same line number....just the same text on various line numbers – KennyC Sep 27 '12 at 18:58
0

You can use difflib for ttis:

>>> f1 = """1 2 3 4
... test
... 6 7 8 9
... compare
... me
... """
>>> 
>>> f2 = """6 7 8 9
... 10 11 12 13
... me
... """
>>>
>>> import difflib
>>> for line in difflib.ndiff(f1.splitlines(), f2.splitlines()):
...    if line.startswith('-'):
...       print "Second file is missing line: '%s'" % line
...    if line.startswith('+'):
...       print "Second file contains additional line: '%s'" % line
... 
Second file is missing line: '- 1 2 3 4'
Second file is missing line: '- test'
Second file is missing line: '- compare'
Second file contains additional line: '+ 10 11 12 13'
jterrace
  • 64,866
  • 22
  • 157
  • 202
  • thank-you. Unfortunately I believe this example is actually checking for file number integreity as a cursory look shows it treats `f1 = 1: 1 10` different from `f2 = 8: 1 10` and I need to overlook the fact that the line numbers do not agree. – KennyC Sep 27 '12 at 19:37
  • yeah, you would have to strip out the line numbers first – jterrace Sep 27 '12 at 19:56