1

I am trying to do the following. Compare two text files ( Masterfile and usedfile) and write the unique values(not common in both) of Masterfile to third file (Newdata ). Both files have one word in each line. example:

Masterfile content

Johnny
transfer
hello
kitty

usedfile content

transfer
hello

expected output in Newdata

Johnny
kitty

I have two solutions but both have problem

solution 1:This gives information like -,+ prefixed to the data final output.

import difflib

with open(r'C:\Master_Data.txt','r') as masterfile:
    with open(r'C:\Used_Data.txt','r') as usedfile:
        with open(r'c:\Ready_to_use.txt','w+') as Newdata:
            tempmaster = masterfile.readlines()
            tempusedfile = usedfile.readlines()
            d = difflib.Differ()
            diff = d.compare(tempmaster,tempusedfile)
            for line in diff:
                Newdata.write(line)

solution 2: I tried using set ,it shows fine when I use print statement but don't know how to write to a file.

with open(r'C:\Master_Data.txt','r') as masterfile:
    with open(r'C:\Used_Data.txt','r') as usedfile:
        with open(r'c:\Ready_to_use.txt','w+') as Newdata:
           difference = set(masterfile).difference(set(usedfile))
           print difference

Can anyone suggest

  1. how I can correct the solution 2 to write to a file.
  2. can I use difflib to accomplish the task
  3. Any better solution to achieve the end result
Joe_12345
  • 589
  • 2
  • 7
  • 19

3 Answers3

1

Ok,

1) You can use solution 2 to write to a file by adding this:

difference = set(masterfile).difference(set(usedfile))
[Newdata.write(x) for x in difference]

This is a shorthand way of doing this:

for x in difference:
    Newdata.write(line)

However, this will just write each element in the difference set to the Newdata file. If you use this method make sure that you have the correct values in your difference array to start with.

2) I wouldn't bother using difflib, it's an extra library that isn't required to do something small like this.

3) This is how I would do it, without using any libraries and simple comparison statements:

with open(r'Master_Data.txt','r') as masterdata:
with open(r'Used_Data.txt','r') as useddata:
    with open(r'Ready_to_use.txt','w+') as Newdata:

        usedfile = [ x.strip('\n') for x in list(useddata) ] #1
        masterfile = [ x.strip('\n') for x in list(masterdata) ] #2

        for line in masterfile: #3
            if line not in usedfile: #4
                Newdata.write(line + '\n') #5

Here's the explaination:

First I just opened all the files like you did, just changed the names of the variables. Now, here are the pieces that I've changed

#1 - This is a shorthanded way of looping through each line in the Used_Data.txt file and remove the \n at the end of each line, so we can compare the words properly.

#2 - This does the same thing as #1 except with the Master_Data.txt file

#3 - I loop through each line in the Master_Data.txt file

#4 - I check to see if the line is not in the masterfile array also exists in the usedfile array.

#5 - If the if statement is true, then the line from Master_File.txt we are checking does not appear in Used_Data.txt, so we write it to the Ready_to_use.txt file using the call Newdata.write(line + '\n'). The reason we need the '\n' after is so the file knows to start a new line next time we try to write something.

BigSpicyPotato
  • 739
  • 5
  • 18
  • thanks. Also this is a general question, can you tell why we don't have to read the lines using .read() when we use list or set with fileobject ? – Joe_12345 Apr 30 '17 at 14:57
  • I actually don't know. I would have used a small loop to read the file lines as that is the way I learned it but I noticed that you used `set(masterfile)` in your code and thought it better to keep with what you knew – BigSpicyPotato Apr 30 '17 at 21:49
0

If the data isn't too big you can use two lists to have the lines and compare each element of one list to the other like:

with open('test1.txt', 'r') as masterfile:
        with open('test2.txt', 'r') as usedfile:
            with open('test3.txt', 'w+') as Newdata:
                mlines = masterfile.read().splitlines()
                ulines = usedfile.read().splitlines()
                for line in mlines:
                    if ulines.__contains__(line) == False:
                        Newdata.write(line + '\n')
                for line1 in ulines:
                    if mlines.__contains__(line1) == False:
                        Newdata.write(line1 + '\n')
Mohd
  • 5,523
  • 7
  • 19
  • 30
0

Using Solution 2:

with open(r'C:\Master_Data.txt','r') as masterfile:
    with open(r'C:\Used_Data.txt','r') as usedfile:
        difference = set(masterfile).difference(usedfile)

with open('Ready_to_use.txt', 'w') as file_out:
    for line in difference:
        file_out.write(line)