0

I'm trying to include a trimming step in my script to remove 20bp of sequence from the 3' end of my sequence if the sequence is >290bp long. Here is the part of the script:

from Bio import SeqIO

fasta1 = SeqIO.parse(path+'\FolderMulti.fasta', "fasta") #reading in the multifasta to make a seq object
fasta2 = []  # make new list
for rec in fasta1:

    if len (rec.seq) >290: #if sequence length is >290bp
        rec.seq = rec.seq[:-20] #delete the last 20bp (to get rid of any possible primer seq)
        fasta2.append(rec) #add trimmed seq record to new list
    else:
        fasta2.append(rec) #if seq is <290bp, will just add the usual record to the new list
SeqIO.write(fasta2, path+'\FolderMulti.fasta', "fasta") 

When I run my script, it seems that all sequences >290 are trimmed apart from the last sequence in my multifasta and I can't figure out why...

In addition, there seems to be something wrong with the part where the sequences <290bp are added to the new list as in the downstream script they return no BLAST results whereas sequences >290bp do (and before I added this trimming step, these sequences returned results as expected). Any ideas about this also would be appreciated.

Test fasta:
>test1
CCTTTACACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATA

>test2
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATAA

>test3
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCC

>test4
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACGTTCCGATAA

>test5
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCC

>test6
ACCCGAAGGCCTTCTTCAGACACGCGGCATGGCTGGATCAGGCTTGCGCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCGGATCATCCTCTCAGACCCGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCCGCCAACCAGCTAATCAGATATCGGCCGCTCGGATAGCGCAAGGCCCGAAGGTCCCCTGCTTTCCCTCTCAAGACGTATGCGGTATTAGCTGATCTTTCGATCAGTTATCCCCCGCTACCCGGTACG
Michael M.
  • 10,486
  • 9
  • 18
  • 34
Jess
  • 13
  • 1
  • 5
  • Can you post an example FASTA file so we can test our code? – MattDMo Aug 18 '22 at 14:28
  • Sorry, what is the best way to do this as I made an example fasta but its too long to comment? – Jess Aug 18 '22 at 14:52
  • Please [edit] your question and put the text there, using the `{}` button to format it. Try to limit it to less than 40 lines or so. Even 10 records would be fine. – MattDMo Aug 18 '22 at 14:56
  • 1
    Your code worked for me using your example data. `test6` was trimmed from 296 to 276. The others were trimmed as appropriate - all but `test3` (exactly 290) and `test5` (234) were trimmed by 20. – MattDMo Aug 18 '22 at 15:32
  • 2
    Code appears correct. Maybe not great to read and write to the same file though (especially if you ran the same script multiple times) – Chris_Rands Aug 21 '22 at 21:06

0 Answers0