-1

I have a circular mtDNA reference. After the alignment I would like to cut first and last 15 bases in my fasta files. How can I do that?

For example, this is my sequence and I need to take out first and last 15 letters. The first 15 characters will not always be the same for all the samples.

>ref|NC_005038.2| mitochondrion_circulized, complete genome
TTTACATGGTAAGTGGTTGATGTAGCTTAAACTTAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTACC
AACTCCATAAACACATAGGTTTGGTCCCAGCCTTCCTGTTAACTCTCAACAGACTTACACATGCAAGCAT
CCACGCCCCGGTGAGTAACGCCCTCCAAATCAATAAGACTAAGAGGAGCAGGTATCAAGCACACATCTTG
TAGCTTATAACGCCTCGCTTAACCACACCCCTACGGGAGACAGCAGTGACAAAAATTAAGCCATAAACGA

The output I expect would be:

>ref|NC_005038.2| mitochondrion_circulized, complete genome
GTTGATGTAGCTTAAACTTAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTACCAACTCCATAAACACA
TAGGTTTGGTCCCAGCCTTCCTGTTAACTCTCAACAGACTTACACATGCAAGCATCCACGCCCCGGTGAG
TAACGCCCTCCAAATCAATAAGACTAAGAGGAGCAGGTATCAAGCACACATCTTGTAGCTTATAACGCCT
CGCTTAACCACACCCCTACGGGAGACAGCAGTGACAAAAA
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
Anna
  • 53
  • 6

1 Answers1

1

Here is one way to do it using Python's Biopython module:

from Bio import SeqIO
import textwrap #use it to wrap it to certain column size

input_file   = open('input.fasta','r')
output_file  = open('output.fasta', "w")

for seq_record in SeqIO.parse(input_file, "fasta"):
    sequence = str(seq_record.seq)
    trimmed_sequence = sequence[15:-15]
    wrapped_sequene  = textwrap.fill(trimmed_sequence, width = 60)
    fasta_record = f'>{seq_record.id}\n{wrapped_sequene}'
    output_file.write(fasta_record)
    print(fasta_record) # will print on the screen

input_file.close()
output_file.close()
Supertech
  • 746
  • 1
  • 9
  • 25