I am relatively new to python so please forgive the idiocies that come with this question. I have a genbank file and have written a piece of code that will take the top 3 longest genes and place them into a newly generated fasta file.
from Bio import SeqIO
file="sequence.gb"
output=open("Top3.faa", "w")
record=SeqIO.parse(file, "genbank")
rec=next(record)
print('The genes with the top 3 longest lengths have beens saved in Top3.faa')
for f in rec.features:
end=f.location.end.position
start=f.location.start.position
length=end-start
bug=(rec.seq)
if f.type=='CDS':
if 'gene' in f.qualifiers:
if length>7000:
geneName=f.qualifiers['gene']
name=str(geneName)
lenth=str(length)
seq=str(bug[start:end])
output.write('>')
output.write(lenth)
output.write('\n')
output.write(seq)
output.write('\n')
output.close()
What i'm trying to do is instead of manually imputing a check if it's over 7kb to find a way of the code do that itself and find the 3 top hits automatically. Any sort of help with direction of where i could go with this would be much appreciated. Thanks