My first time using biopython. Forgive me if this is a basic question.
I would like to input sequences, then align them, and be able to refer to the index position of the original sequence (ungapped) and the aligned sequence (gapped).
My real world example is enolase (Uniprot P37869 and P0A6P9). The substrate binding lysine is index 392 in E. coli and 389 in B. subtilis. If one does a muscle alignment of the two, the index of this lysine in the alignment is 394. I want to be able to convert easily between gappend and ungapped indicies.
Example 1: What is the aligned index of E. coli residue #392? (answer 394 in the aligned sequence).
Example 2 I found a conserved residue in the alignment at 394. where is that in the original (ungapped) sequence? (answer 392 in E. coli and 389 in B. subtilis).
>>>cline = MuscleCommandline(input="enolase.txt", out="enolase.aln")
>>>cline()
>>> alignment = AlignIO.read("enolase.aln","fasta")
>>> print(alignment[:,390:])
SingleLetterAlphabet() alignment with 2 rows and 45 columns
AGQIKTGAPSRTDRVAKYNQLLRIEDQLAETAQYHGINSFYNLNK sp|P37869|ENO_BACSU
AGQIKTGSMSRSDRVAKYNQLIRIEEALGEKAPYNGRKEIKGQA- sp|P0A6P9|ENO_ECOLI
>>> print(alignment[:,394])
KK