0

I'm wondering if anyone knows any tools which allow me to count the frequency of amino acids at any specific position in a multiple-sequence alignment.

For example if I had three sequences:

Species 1 - MMRSA 
Species 2 - MMLSA
Species 3 - MMRTA 

I'd like for a way to search by position for the following output:

Position 1 - M = 3;
Position 2 - M = 3;
Position 3 - R = 2, L = 1;
Position 4 - S = 2, T = 1;
Position 5 - A = 3.

Thanks! I'm familiar with R and Linux, but if there's any other software that can do this I'm sure I can learn.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Kodewings
  • 29
  • 7

2 Answers2

0

Using R:

x <- read.table(text = "Species 1 - MMRSA 
Species 2 - MMLSA
Species 3 - MMRTA")

ixCol = 1
table(sapply(strsplit(x$V4, ""), "[", ixCol))
# M 
# 3 

ixCol = 4
table(sapply(strsplit(x$V4, ""), "[", ixCol))
# S T 
# 2 1

Depending input file format, there are likely a purpose built bioconductor packages/functions.

zx8754
  • 52,746
  • 12
  • 114
  • 209
0

That is really easy to parse, you can use any language of choice. Here is an example in Python using a dict and Counter to assemble the data in a simple object.

from collections import defaultdict, Counter

msa = '''
Species 1 - MMRSA 
Species 2 - MMLSA
Species 3 - MMRTA
'''

r = defaultdict(list) #dictionary having the sequences index as key and the list of aa found at that index as value
for line in msa.split('\n'):
    line = line.strip()
    if line:
        sequence = line.split(' ')[-1]
        for i, aa in enumerate(list(sequence)):
            r[i].append(aa)
            
count = {k:Counter(v) for k,v in r.items()}
print(count)
#{0: Counter({'M': 3}), 1: Counter({'M': 3}), 2: Counter({'R': 2, 'L': 1}), 3: Counter({'S': 2, 'T': 1}), 4: Counter({'A': 3})}

To print the output as you specified:

for k, v in count.items():
    print(f'Position {k+1} :', end=' ') #add 1 to start counting from 1 instead of 0
    for aa, c in v.items():
        print(f'{aa} = {c};', end=' ')
    print()

It prints:

Position 1 : M = 3; 
Position 2 : M = 3; 
Position 3 : R = 2; L = 1; 
Position 4 : S = 2; T = 1; 
Position 5 : A = 3; 
alec_djinn
  • 10,104
  • 8
  • 46
  • 71