I wrote a tiny biopython script to extract sequences from a fasta file based on ID but it does extract duplicates so I am looking to filter sequences from my fasta files which are duplicate (e.g. have the exact same ID).
I tried to modify my script but I failed:
from Bio import SeqIO
id = []
for line in open("short.txt","r"):
id.append(line.rstrip().strip('"'))
for rec in SeqIO.parse("out.fa","fasta"):
#print rec.id
if rec.id in id:
if rec.id not in rec.format:
print rec.format("fasta")
Can anyone help?