Your if
is outside the for
loop, so it only applies once, using the variables with the values they had at the end of the last iteration of the loop. If you want the if
to happen every iteration, you need to indent it at the same level as the code before:
for record in SeqIO.parse("dnaseq.fasta", "fasta"):
protein_id = record.id
protein1 = record.seq.translate(to_stop=True)
protein2 = record.seq[1:].translate(to_stop=True)
protein3 = record.seq[2:].translate(to_stop=True)
# Same indentation level, still in the loop
if len(protein1) > len(protein2) and len(protein1) > len(protein3):
protein = protein1
elif len(protein2) > len(protein1) and len(protein2) > len(protein3):
protein = protein2
else:
protein = protein3
Your function prot_record
uses the current value of protein
and protein_id
, which are again what they were at the end of the last iteration of the for
loop.
If I'm guessing correctly what you want, one possibility might be to put this function declaration inside the loop too, in order for the function to have one specific behaviour depending on the current iteration of the loop, and save the function in a list for later use, when iterating again over the records. I'm not certain this works, though:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
# List of functions:
record_makers = []
for record in SeqIO.parse("dnaseq.fasta", "fasta"):
protein_id = record.id
protein1 = record.seq.translate(to_stop=True)
protein2 = record.seq[1:].translate(to_stop=True)
protein3 = record.seq[2:].translate(to_stop=True)
# still in the loop
if len(protein1) > len(protein2) and len(protein1) > len(protein3):
protein = protein1
elif len(protein2) > len(protein1) and len(protein2) > len(protein3):
protein = protein2
else:
protein = protein3
# still in the loop
def prot_record(record):
return SeqRecord(seq = protein, \
id = ">" + protein_id, \
description = "translated sequence")
record_makers.append(prot_record)
# zip the functions and the records together instead of
# mapping one single function to all the records
records = [record_maker(record) for (
record_maker, record) in zip(
record_makers, SeqIO.parse("dnaseq.fasta", "fasta"))
SeqIO.write(records, "AAseq.fasta", "fasta")]
Another possible approach is to put the translation logic inside the record-making function:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
def find_translation(record):
protein1 = record.seq.translate(to_stop=True)
protein2 = record.seq[1:].translate(to_stop=True)
protein3 = record.seq[2:].translate(to_stop=True)
if len(protein1) > len(protein2) and len(protein1) > len(protein3):
protein = protein1
elif len(protein2) > len(protein1) and len(protein2) > len(protein3):
protein = protein2
else:
protein = protein3
return protein
def prot_record(record):
protein = find_translation(record)
# By the way: no need for backslashes here
return SeqRecord(seq = protein,
id = ">" + record.id,
description = "translated sequence")
records = map(prot_record, SeqIO.parse("dnaseq.fasta", "fasta"))
SeqIO.write(records, "AAseq.fasta", "fasta")]
This is possibly cleaner. I haven't tested.