1

I have two txt files that every line is an entry. For instance;

    #first txt file

        Jonathan25
        Donald32
        Ethan21
        mjisgoat

    #second txt file

        Ethan21
        leonardo1111
        michalengeloo
        Jonathan25

How can I form my code that gives the unique values that exist in the second txt file but do not exist in the first txt file? Actually, first element of second txt file should be compared to all elements of first txt file. Then, if there is no match, I need to see the value. In this case, what I would like to get as a result is "leonardo1111" and "michalengeloo".

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
Murcielago
  • 45
  • 7

5 Answers5

2

Easy way in Python would be to read both files into sets, then apply set difference. We should also make sure newlines are stripped, to cover cases like Jonathan25\n and Jonathan25, which should be equal, but won't be if the \n is included.

with open("file1.txt") as f1, open("file2.txt") as f2:
    s1 = {line.strip() for line in f1}
    s2 = {line.strip() for line in f2}

    print(s2.difference(s1))

Output:

{'michalengeloo', 'leonardo1111'}
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
2

You can use the join command in unix. Have each file sorted. Then

$ join -1 1 -2 1 -v 2 -o 0 file1 file2

Or you can use python: 1. create a set. Cycle through file1 line by line and throw the words into the set. 2. Cycle through file2, and search in the set just created for each word from file2. Those not found in the set are the word you need to identify.

newman
  • 97
  • 9
1

Using awk: awk 'FNR==NR {a[$0]++; next} !a[$0]' first_txt_file second_txt_files

Python, use sets: https://docs.python.org/3/tutorial/datastructures.html#sets

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
1
a = [1,2,3,4]
b = [2,3,4,5]
c = filter(lambda x: x not in a, b)

in this case, c will only contains 1 element - 5 So you can try to read content of file1 in to a, read content of file2 into b.

Zhd Zilin
  • 143
  • 7
1

An alternative way: set-arithmetic, https://stromberg.dnsalias.org/~strombrg/set-arithmetic/

With set-arithmetic, you can just:

$ set-arithmetic --difference second.txt first.txt 
michalengeloo
leonardo1111

It's written in Python. It treats each line of the input files as a set element.

dstromberg
  • 6,954
  • 1
  • 26
  • 27