0

I have 3 notepad files in directory , i want to compare 1st file to other 2 and drop the duplicate blocks keep unique output , for Example :

File 1:

  User enter email id {
  email id:(xyz@gamil.com)
  action:enter
  data:string }

User enter password {
passoword:(12345678)
action:enter
data:string }

 User click login {
 action:click
 data:NAN }

File 2 :

User enter email id {
email id:(xyz@gamil.com)
action:enter
data:string }

User enter password {
passoword:(12345678)
action:enter
data:string }

 User navigates another page {
 action:navigates
 data:NAN }

File 3 :

 User enter email id {
 email id:(abc@gamil.com)
 action:enter
 data:string }

 User enter password {
 passoword:(12345678)
 action:enter
 data:string }

 User submit to login {
 action:submit
 data:NAN }

I want output of file 2 and file 3 is :

File 2 :

 User navigates another page {
 action:navigates
 data:NAN }

File 3 :

 User enter email id {
 email id:(abc@gamil.com)
 action:enter
 data:string }
 
 User submit to login {
 action:submit
 data:NAN }

1 Answers1

1

Open the first file and make a list of paragraphs

with open('file1.txt', 'r') as f:
    paragraphs = f.read().split('\n\n')

Now open the second file and make a list of paragraphs in the second file and remove the paragraphs that are in the first file

with open('file2.txt', 'r') as f:
    paragraphs2 = f.read().split('\n\n')
    paragraphs2 = [x for x in paragraphs2 if x not in paragraphs]

Now write the changes to the second file

with open('file2.txt', 'w') as f:
    f.write('\n\n'.join(paragraphs2))

Perform the same operations for the third file too

with open('file3.txt', 'r') as f:
    paragraphs3 = f.read().split('\n\n')
    paragraphs3 = [x for x in paragraphs3 if x not in paragraphs]

with open('file3.txt', 'w') as f:
    f.write('\n\n'.join(paragraphs3))

What if there are too many files? We use loops as demonstrated below:

First, create a list of paragraphs

with open('file1.txt', 'r') as f:
    paragraphs = f.read().split('\n\n')

Create a list of all the files that have to be removed duplicates from

import os
lst = [f for f in os.listdir('.') if f.endswith('.txt') and f != 'file1.txt']

Now loop through the list of files and modify them

for f in lst:
    with open(f, 'r') as file:
        paragraphs_in_other_files = file.read().split('\n\n')
        paragraphs_in_other_files = [p for p in paragraphs_in_other_files if p not in paragraphs]

    with open(f, 'w') as file:
        file.write('\n\n'.join(paragraphs_in_other_files))
Geeky Quentin
  • 2,469
  • 2
  • 7
  • 28
  • Thanks !!! it is working fine , but what if there is more than 150 files to compare ? its not ideal to read and write every files like this way , how can i put in loop ? – Soumya_jeet Jul 11 '22 at 05:56
  • Create a list of files, i.e. `['file2.txt', 'file3.txt', .... ]` and loop through them, read them, and modify them. – Geeky Quentin Jul 11 '22 at 06:15
  • @Soumya_jeet I have modified my answer, take a look – Geeky Quentin Jul 11 '22 at 06:28
  • Thanks !! another concern , if i want to compare every file each other for example : file1 to compare 2 and 3 , file2 to compare 3 and 1 and file 3 compare file 2 and 1 . – Soumya_jeet Jul 15 '22 at 06:42