I have a text file with a list of repeated names (some of which have accented alphabets like é, à, î etc.)
e.g. List: Précilia, Maggie, Précilia
I need to write a code that will give an output of the unique names.
But, my text file seems to have different character-encoding for the two accented é's in the two occurrences of Précilia (I am guess perhaps ASCII for one and UTF-8 for another). Thus my code gives both occurrences of Précilia as different unique elements. You can find my code below:
seen = set()
with open('./Desktop/input1.txt') as infile:
with open('./Desktop/output.txt', 'w') as outfile:
for line in infile:
if line not in seen:
outfile.write(line)
seen.add(line)
Expected output: Prècilia, Maggie
Actual and incorrect output: Prècilia, Maggie, Prècilia
Update: The original file is a very large file. I need a way to consider both these occurrences as a single one.