4

I have a scenario in which I have a peptide frame having 9 AA. I want to generate all possible peptides by replacing a maximum of 3 AA on this frame ie by replacing only 1 or 2 or 3 AA.

The frame is CKASGFTFS and I want to see all the mutants by replacing a maximum of 3 AA from the pool of 20 AA.

we have a pool of 20 different AA (A,R,N,D,E,G,C,Q,H,I,L,K,M,F,P,S,T,W,Y,V).

I am new to coding so Can someone help me out with how to code for this in Python or Biopython.

output is supposed to be a list of unique sequences like below:

CKASGFTFT, CTTSGFTFS, CTASGKTFS, CTASAFTWS, CTRSGFTFS, CKASEFTFS ....so on so forth getting 1, 2, or 3 substitutions from the pool of AA without changing the existing frame.

2 Answers2

5

Ok, so after my code finished, I worked the calculations backwards,

Case1, is 9c1 x 19 = 171

Case2, is 9c2 x 19 x 19 = 12,996

Case3, is 9c3 x 19 x 19 x 19 = 576,156

That's a total of 589,323 combinations.

Here is the code for all 3 cases, you can run them sequentially.

You also requested to join the array into a single string, I have updated my code to reflect that.

import copy
original = ['C','K','A','S','G','F','T','F','S']
possibilities = ['A','R','N','D','E','G','C','Q','H','I','L','K','M','F','P','S','T','W','Y','V']
storage=[]
counter=1

# case 1
for i in range(len(original)):
    for x in range(20):
        temp = copy.deepcopy(original)
        if temp[i] == possibilities[x]:
            pass
        else:
            temp[i] = possibilities[x]
            storage.append(''.join(temp))
            print(counter,''.join(temp))
            counter += 1

# case 2
for i in range(len(original)):
    for j in range(i+1,len(original)):
        for x in range(len(possibilities)):
            for y in range(len(possibilities)):
                temp = copy.deepcopy(original)
                if temp[i] == possibilities[x] or temp[j] == possibilities[y]:
                    pass
                else:
                    temp[i] = possibilities[x]
                    temp[j] = possibilities[y]
                    storage.append(''.join(temp))
                    print(counter,''.join(temp))
                    counter += 1

# case 3
for i in range(len(original)):
    for j in range(i+1,len(original)):
        for k in range(j+1,len(original)):
            for x in range(len(possibilities)):
                for y in range(len(possibilities)):
                    for z in range(len(possibilities)):
                        temp = copy.deepcopy(original)
                        if temp[i] == possibilities[x] or temp[j] == possibilities[y] or temp[k] == possibilities[z]:
                            pass
                        else:
                            temp[i] = possibilities[x]
                            temp[j] = possibilities[y]
                            temp[k] = possibilities[z]
                            storage.append(''.join(temp))
                            print(counter,''.join(temp))
                            counter += 1

The outputs look like this, (just the beginning and the end).

The results will also be saved to a variable named storage which is a native python list.

1 AKASGFTFS
2 RKASGFTFS
3 NKASGFTFS
4 DKASGFTFS
5 EKASGFTFS
6 GKASGFTFS
...
...
...
589318 CKASGFVVF
589319 CKASGFVVP
589320 CKASGFVVT
589321 CKASGFVVW
589322 CKASGFVVY
589323 CKASGFVVV

It takes around 10 - 20 minutes to run depending on your computer.

It will display all the combinations, skipping over changing AAs if any one is same as the original in case1 or 2 in case2 or 3 in case 3.

This code both prints them and stores them to a list variable so it can be storage or memory intensive and CPU intensive.

You could reduce the memory foot print if you want to store the string by replacing the letters with numbers cause they might take less space, you could even consider using something like pandas or appending to a csv file in storage.

You can iterate over the storage variable to go through the strings if you wish, like this.

for i in storage:
    print(i)

Or you can convert it to a pandas series, dataframe or write line by line directly to a csv file in storage.

anarchy
  • 3,709
  • 2
  • 16
  • 48
  • Thank you so much for your help... 1. output is like **['A', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']** but for further calculations I need it in **AKASGFTFS, AKASGFCCS, .....** format. Is it possible to get like it? also Please tell how to sto store it. – shivam Gupta Dec 01 '21 at 04:53
  • yes thats easy you can just merge them into a string, hold on let me add it – anarchy Dec 01 '21 at 04:53
  • @shivamGupta i have updated my code, please take a look, also please mark my answer correct if it works – anarchy Dec 01 '21 at 04:59
  • **yes thats easy you can just merge them into a string, hold on let me add it ** being new to coding can you please tell me how to do it? Thas will be a great help. – shivam Gupta Dec 01 '21 at 05:01
  • 1
    @shivamGupta its already in the code, please take a look, `''.join(temp)` merges the string you can see i have added it for you, please look at the code properly – anarchy Dec 01 '21 at 05:01
  • 1
    @shivamGupta i updated the code so the strings are saved to a `storage` variable. you can do a for loop over the storage variable to access the strings. – anarchy Dec 01 '21 at 07:02
  • Thanks... Do I need to check storage before appending it to avoid the repetition of the permutants i.e. `if not temp in storage:` `storage.append(temp)` – shivam Gupta Dec 01 '21 at 07:14
  • 1
    you can give it a try if you want @shivamGupta if you figure it out on your own and experiment you will learn faster – anarchy Dec 01 '21 at 07:15
  • 1
    @shivamGupta it will take much longer to run though because you will be checking the list in every iteration, you could check for duplicates after using a library like pandas – anarchy Dec 01 '21 at 07:17
  • @shivamGupta mathematically though... i dont think there is overlap – anarchy Dec 01 '21 at 07:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239723/discussion-between-shivam-gupta-and-anarchy). – shivam Gupta Dec 01 '21 at 09:15
  • @shivamGupta i am in the chat, btw i ran the code for you, there is no overlap, checking for duplicates is redundant – anarchy Dec 01 '21 at 09:17
1

Let's compute the total number of mutations that you are looking for.

Say you want to replace a single AA. Firstly, there are 9 AAs in your frame, each of which can be changed into one of 19 other AA. That's 9 * 19 = 171

If you want to change two AA, there are 9c2 = 36 combinations of AA in your frame, and 19^2 permutations of two of the pool. That gives us 36 * 19^2 = 12996

Finally, if you want to change three, there are 9c3 = 84 combinations and 19^3 permutations of three of the pool. That gives us 84 * 19^3 = 576156

Put it all together and you get 171 + 12996 + 576156 = 589323 possible mutations. Hopefully, this helps illustrate the scale of the task you are trying to accomplish!

jozborn
  • 39
  • 1
  • 3