I have a code that open two files, save their contents to sets (set1 and set2) and save the results of a pairwise comparison between these sets to an output file. Both files are really big (more than 100K lines each) and this code is taking a long time to output (more than 10h).
Is there a way to optimize its performance?
def matches2smiles():
with open('file1.txt') as f:
set1 = {a.rstrip('\n') for a in f}
with open('file2.txt') as g:
set2 = {b.replace('\n', '') for b in g}
with open('output.txt', 'w') as h:
r = [
h.write(b + '\n')
for a in set1
for b in set2
if a in b
]
matches2smiles()