Whats the easiest way to trim or pad a group of biopython fastfa files untill they are all a certain length so that I can add them to a multiple sequence alignment? BioPython AlignIO ValueError says strings must be same length? Similar to the answer here except with multiple sequences, no text file and in the end it should all be incorporated into a multiplesequenceallignment. The end goal is to have all sequences be 570 characters. I intend to incoporate all of this into a phylum tree
Asked
Active
Viewed 677 times
0
-
Do you have an example and a desired result that can help us to understand your need ? – codrelphi Dec 01 '19 at 01:00
-
https://stackoverflow.com/questions/32833230/biopython-alignio-valueerror-says-strings-must-be-same-length Something like the answer to this except I need it done too multiple sequences and I don't know if the sequence is too short or too long. Also I just use fastfas instead of text files. – james latimer Dec 01 '19 at 01:08
-
1rather than trimming/padding your sequences, shouldn't you be aligning them? `Bio.AlignIO` is for manipulating sequences that are already aligned and if you want to build a phylogeny, it is based on the alignment with gaps appropriately placed not arbitrary padding – Chris_Rands Dec 01 '19 at 13:29
1 Answers
0
I'm not familiar with Biopython, but I know you easily do in pysam by reading the FASTA, looping over each sequence, trim the sequence to a certain size, and then writing it to a new FASTA. See example below:
from pysam import FastxFile
fasta_q_file = "INPUT.fasta"
out_filename = "OUTPUT_NAME.fasta"
size_size_trim = 50
with FastxFile(fasta_q_file) as fh, open(out_filename, mode='w') as fout:
for entry in fh:
sequence_id = entry.name
sequence = entry.sequence
if sequence > size_size_trim:
fout.write(">{}_trimmed_to_{}_bp\n{}\n".format(size_size_trim,sequence_id, sequence[:size_size_trim]))
else:
if sequence == size_size_trim:
fout.write(">{}\n{}\n".format(sequence_id, sequence[:size_size_trim]))
else:
# sequences shorter than `size_size_trim` are not written.
continue

metageni
- 11
- 2