2

Hello. I am writing a function to find identical columns of alignment and then store those columns in a dictionary such that key should be the column (as a string) and the value is a list containing the indexes of the columns. I have having some difficulty. My current code can make just one alignment:

from Bio.Align import MultipleSeqAlignment
from Bio.Alphabet import IUPAC, Gapped, generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
align1 = MultipleSeqAlignment([
          SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"),
          SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"),
          SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),
      ])
print align1.format("phylip")

I am not sure how to proceed from here.

The output should be a dictionary containing the identical columns of alignment as key and indexing of identical columns as the value.

acattle
  • 3,073
  • 1
  • 16
  • 21
  • I don't really understand what you're trying to do here, but assuming you know how to find the string for the dictionary's key and the list of values, it'd be as simple as looping over the data and filling the `dict`. – TankorSmash Feb 13 '13 at 06:34

2 Answers2

0

You can access a column with align1[:,index].

To create the dictionary use this loop:

dict = {}
for i in range(0,align1.get_alignment_length()):
   if align1[:,i] in dict:
      dict[align1[:,i]].append(i)
   else:
      dict[align1[:,i]] = [i]
MoRe
  • 1,478
  • 13
  • 25
0

Can't upvote the response by user1 above as my score is too low, but that is the correct way to access columns of a MSA.

Read in your alignment using AlignIO:

align1 = AlignIO.read(open("alignment.aln"), "clustal")

Then create the dictionary as described in user1's post.

You can then access columns of the MSA thus:

align1[0:1,0:10]

Where the first index (0:1) will give you the first line of the alignment, and the second index (0:10) will give you the first 10 positions of the alignment.

mu0u506a
  • 31
  • 4