I'm trying to sort a fasta file by alphabetical order of the sequences in the file (not the ID of the sequences). The fasta file contains over a 200 sequences and I'm trying to find duplicates (by duplicates I mean almost same protein sequence, but not same ID) within a bit master (using a python code). So I wanted to make a dictionary out of the fasta file and then sort dictionary's values. The code I am trying to use is the following :
from Bio import SeqIO
input_file = open("PP_Seq.fasta")
my_dict = SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
print sorted(my_dict.values())
I keep getting this message error :
"Traceback (most recent call last):
File "sort.py", line 4, in <module>
print sorted(my_dict.values())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqRecord.py", line 730, in __lt__
raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest."
I also tried to look for how to fin this error but there ares't much information about this, and few of the informations I read where apparently saying that the length of sequences stored in dictionary dictionary may be a problem?... If so how to sort the fasta file without SeqIO?