I want to build a consensus sequence from several sequences in python and I'm looking for the most efficient / most pythonic way to achieve this.
I have a list of strings like this:
sequences = ["ACTAG", "-TTCG", "CTTAG"]
I furthermore have an alphabet like this:
alphabet = ["A", "C", "G", "T"]
and a position frequency matrix like this:
[A C G T]
1 1 0 0
0 1 0 2
0 0 0 3
2 1 0 0
0 0 3 0
If a character occurrs the most at a position, this character is taken for the consensus sequence.
Additionally, when 2 or more characters have the same occurrences for the same position there are additional characters (in this example at position 0 => A or C = M, see IUPAC Codes)
The expected consensus sequence for my example is therefore "MTTAG".
EDIT:
What is the most efficient / most pythonic way to get this consensus sequence based on the given alphabet and position frequency matrix?