I'm trying to include a trimming step in my script to remove 20bp of sequence from the 3' end of my sequence if the sequence is >290bp long. Here is the part of the script:
from Bio import SeqIO
fasta1 = SeqIO.parse(path+'\FolderMulti.fasta', "fasta") #reading in the multifasta to make a seq object
fasta2 = [] # make new list
for rec in fasta1:
if len (rec.seq) >290: #if sequence length is >290bp
rec.seq = rec.seq[:-20] #delete the last 20bp (to get rid of any possible primer seq)
fasta2.append(rec) #add trimmed seq record to new list
else:
fasta2.append(rec) #if seq is <290bp, will just add the usual record to the new list
SeqIO.write(fasta2, path+'\FolderMulti.fasta', "fasta")
When I run my script, it seems that all sequences >290 are trimmed apart from the last sequence in my multifasta and I can't figure out why...
In addition, there seems to be something wrong with the part where the sequences <290bp are added to the new list as in the downstream script they return no BLAST results whereas sequences >290bp do (and before I added this trimming step, these sequences returned results as expected). Any ideas about this also would be appreciated.
Test fasta:
>test1
CCTTTACACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATA
>test2
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATAA
>test3
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCC
>test4
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATAA
>test5
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCC
>test6
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACG