python compare rows in big files

Question

I need to compare two .csv files (files are over 65000 lines). Find lines that are not in the second file. I am using difflib.ndiff:

for line in difflib.ndiff(text1, text2):
    print(line,)

But I get unexpected results. The function finds two identical strings and marks them as different:

+ Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,
- Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,

What could be the problem?
What might be a suitable way to find the differences?

2.

from itertools import izip_longest
l1 = map(lambda x: x.strip(), list(open('test1.txt')))
l2 = map(lambda x: x.strip(), list(open('test2.txt')))
diff_list = izip_longest(l1, l2)
for diff in diff_list:
    print '%s %s %s' % (
        diff[0] or '', 
        '==' if diff[0] == diff[1] else '!=',
        diff[1] or '',
    )

I tried to use the following code to compare files, but I got the same unexpected result, why is this so?

Gr4,DQ_1Gb_1m_DR_926_23486,100,,,70,,!=Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,
Gr4,DQ_3Gb_1m_DR_926_23489,100,,,70,,!=Gr4,DQ_1Gb_1m_DR_926_23486,100,,,70,,

if you're using linux you should use `diff` or `rdiff.` 65000 lines is relatively small and can be done programatically, however if you start going into the millions python has a very hard time with malloc and comparisons: pandas is usually the best bet if you do need to use python — benjessop, Aug 06 '20 at 09:43
I have a python script ready already. The only problem is that difflib does not work correctly. I need to compare each line of a file (there may be differences in any field of the line) and output the lines not found — stammer, Aug 06 '20 at 11:59
on your last code, cast your diff items to string. For example, `str(diff[0])` — anlgrses, Aug 11 '20 at 04:51

score 0 · Answer 1 · answered Aug 11 '20 at 05:57

0

This is easy when you use pandas. Since you're not provided the dataset. I'll use my own.

Assume, i've two csv's.

Data looks like this :

Now print line, that is not present in second file (benz model in not present in second file):

answered Aug 11 '20 at 05:57

CodeRed

81
6

python compare rows in big files

1 Answers1