2

I am writing a script which can replace all of the instances of an amino acid residue in a column of a FASTA alignment file. Using AlignIO, I just can read an alignment file and extract information from it but I can't modify their sequences. Moreover, MutableSeq module just can be modify string sequences and if I use a seq object input, it can't modify it. I'd like to find a module or a method to modify an alignment file and save it, while it is in the structure of AlignIO as a sequence object for subsequent procedures.

My code using just AlignIO:

alignment = AlignIO.read(input_handle, "fasta")
for record in alignment:
    if record.seq[10] == "G":
        record.seq[10] = "T"

Output:

record.seq[10] = "T"
TypeError: 'Seq' object does not support item assignment

My code using both AlignIO and MutableSeq:

alignment = AlignIO.read(input_handle, "fasta")
for record in alignment[0:1, : ]:
    mut_align = MutableSeq(record.seq)
    mut_align.__delitem__(10)
    mut_align.insert(10, "T")
    print(mut_align)

Output:

del self.data[index]
TypeError: 'Seq' object doesn't support item deletion
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
Reza Rezaei
  • 487
  • 4
  • 12

1 Answers1

2

This is a good question, I think what you're doing with MutableSeq should work or fail clearly right away, but instead here is a workaround:

from Bio import AlignIO

alignment = AlignIO.read('test.fasta', 'fasta')
for record in alignment:
    record.seq = record.seq.tomutable()
    if record.seq[2] == "G":
        record.seq[2] = "T"
    print(record)

Outputs:

ID: 1
Name: 1
Description: 1
Number of features: 0
MutableSeq('ATTAG')
ID: 2
Name: 2
Description: 2
Number of features: 0
MutableSeq('AATAG')

For input data:

$ cat test.fasta 
>1
ATGAG
>2
AAGAG

I consider the fact that a MutableSeq object is created from a Seq object in your example but that the method fails as a bug, which I filed here: https://github.com/biopython/biopython/issues/1918


Here is another rather inefficient workaround, re-building the string each time, if you want to avoid using MutableSeq all-together:

alignment = AlignIO.read('test.fasta', 'fasta')
for record in alignment:
    if record.seq[2] == "G":
        record.seq = record.seq[:2] + 'T' + record.seq[3:]
    print(record)
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • Thanks for your answer. However, I believe there should be a Biopython function or method for this problem! I think it is too obvious, that people need to modify alignment files, to be neglected by Biopython! – Reza Rezaei Feb 01 '19 at 17:01
  • @RezaRezaei See my edited answer, I have provided a workaround and filed an issue with Biopython – Chris_Rands Feb 01 '19 at 23:04
  • Thanks for the edited answer and also filing the issue with Biopython. I think I am gonna use the without MutableSeq method, which you wrote first. – Reza Rezaei Feb 02 '19 at 02:56