2

I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process.

So far..

  • I retrive pairs of orthologous DNA sequences from genbank using Biopython package.
  • I translate the orthologous pairs into peptide sequences and then align them using EMBOSS Needle program.

I wish to..

  • Transfer the gaps from the peptide sequences into the original DNA sequences.

Question

I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.

enter image description here

hello_there_andy
  • 2,039
  • 2
  • 21
  • 51
  • Well I have been doing some searching around and found PAL2NAL which does what I need but over a web-server - but I need to do it via Python.. I am tempted to make my own script but I thought perhaps it is out there already or maybe some obvious function in Biopython (e.g. along the lines of "addGapsFromPeptide()" or something) – hello_there_andy Dec 30 '13 at 17:07
  • There was a google summer of code project to work on this in Biopython - http://lists.open-bio.org/pipermail/biopython-dev/2013-July/010718.html. Maybe this could help? – Stedy Dec 30 '13 at 18:04
  • @Stedy Despite the name, I don't think CodonAlign (a [BioPython fork](https://github.com/zruan/biopython/tree/master/Bio/CodonAlign) developed by [Zheng Ruan](http://zruanweb.com/) during GSoC2013) is actually a 'true' codon aligner. Looks like a neat tools for doing analysis on codon-aligned sequences, which can also do the "align proteins and map back to nucleotides" trick. See my other comment below for more info. – Owen May 16 '14 at 19:19

4 Answers4

2

All you need to do is split the nucleotide sequence into triplets. Each amino-acid is a triplet, each gap is three gaps. so in pseudo code:

for x in range(0, len(aminoacid)):
    if x != "-":
       print nucleotide[3x:3x+3]
    else:
       print "---"
Stylize
  • 1,058
  • 5
  • 16
  • 32
  • thank you, I managed to play with your answer to make it work :) – hello_there_andy Jan 01 '14 at 17:11
  • I was also using this approach but it's not really a trye codon-alignment algorithm. It's just an amino acid alignment mapped back to the codons, and [it won't work 'right' in some cases](http://biocozy.blogspot.com/2014/05/hunting-for-perfect-codon-aware-aligner.html). I find that [PRANK](http://code.google.com/p/prank-msa/wiki/PRANK) works but talking to it from a Python script is awkward. A topic probably better suited for [BioStars](https://www.biostars.org/p/2576/). – Owen May 16 '14 at 19:08
  • sorry for typo above: trye => true – Owen May 16 '14 at 19:24
1

You can make a mapping of peptides to nucleotides with the addition of your missing character:

codons = str.maketrans({'M' : 'ATG',
                        'R' : 'CGT',
                        ...,
                        '-' : '---'}) # Your missing character

peptide = 'M-R'
result = peptide.translate(codons)

and then translate the full sequence.

apai
  • 489
  • 4
  • 8
  • 2
    i think the problem here would be that amino acids can be coded for by multiple codons, i.e. CGT is not the only thing that R can be coded by, but you gave me some inspiration to try my own script so thank you – hello_there_andy Dec 30 '13 at 17:46
1

I understand you've asked this question three years ago, but this post is the first thing I find with my google search 'codon alignment python'. Therefore, I wanted to respond to this for everyone that might stumble upon this still looking for a library to do this.

You can use the library PyCogent for this.

They explain it well on their website: http://pycogent.org/examples/align_codons_to_protein.html

Ruben
  • 11
  • 2
0

In the end I made my own Python function, thought I may as well share it.

It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence:

Function

def gapsFromPeptide( peptide_seq, nucleotide_seq ):
    """ Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment) 
          - peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
          - nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
    def chunks(l, n):
        """ Yield successive n-sized chunks from l."""
        for i in xrange(0, len(l), n):
            yield l[i:i+n]
    codons = [codon for codon in chunks(nucleotide_seq,3)]  #splits nucleotides into codons (triplets) 
    gappedCodons = []
    codonCount = 0
    for aa in peptide_seq:  #adds '---' gaps to nucleotide seq corresponding to peptide
        if aa!='-':
            gappedCodons.append(codons[codonCount])
            codonCount += 1
        else:
            gappedCodons.append('---')
    return(''.join(gappedCodons))

Usage

>>> unaligned_dna_seq = 'ATGATGATG'
>>> aligned_peptide_seq = 'M-MM'
>>> aligned_dna_seq = gapsFromPeptide(aligned_peptide_seq, unaligned_dna_seq)
>>> print(aligned_dna_seq)

    ATG---ATGATG
hello_there_andy
  • 2,039
  • 2
  • 21
  • 51