-1

I have a list in a for loop and it uses itertools.product() to find different combinations of letters. I want to use collections.Counter() to count the number of occurrences of an item, however, right now it prints all the different combinations of "A"'s and "G"'s:

['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'g']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'a', 'G']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
# etc.

Now, this isn't all of them, but as you can see, there are some occurrences that are the same although ordered differently, for example:

['a', 'G', 'A', 'G']
['a', 'A', 'G', 'G']

I would much prefer the latter ordering, so I want to find a way to print all of the combinations with capital letters before lower case, and because 'a' is before 'g', also alphabetically. The final product should look like ['AaGG', 'aaGg', etc]. What function or functions should I use?

This is the code that generates the data. The section marked "Counting" is what I'm having trouble with.

import itertools
from collections import Counter
parent1 = 'aaGG'
parent2 = 'AaGg'
f1 = []
f1_ = []
genotypes = []
b = []
genetics = []
g = []
idx = []

parent1 = list(itertools.combinations(parent1, 2))    
del parent1[0]
del parent1[4] 

parent2 = list(itertools.combinations(parent2, 2))    
del parent2[0]
del parent2[4]


for x in parent1:
    f1.append(''.join(x))

for x in parent2:
    f1_.append(''.join(x))

y = list(itertools.product(f1, f1_))  

for x in y:
    genotypes.append(''.join(x))
    break
genotypes = [
        thingies[0][0] + thingies[1][0] + thingies[0][1] + thingies[1][1]
        for thingies in zip(parent1, parent2)
] * 4
print 'F1', Counter(genotypes)

# Counting
for genotype in genotypes:
    alleles = list(itertools.combinations(genotype,2))
    del alleles[1]
    del alleles[3]
    for x in alleles:
        g.append(''.join(x))

for idx in g:
    if idx.lower().count("a") == idx.lower().count("g") == 1:
        break                

f2 = list(itertools.product(g, g)) 

for x in f2:
    genetics.append(''.join(x)) 

for genes in genetics:
    if genes.lower().count("a") == genes.lower().count("g") == 2:
        genes = ''.join(genes)
    print Counter(genes)
jscs
  • 63,694
  • 13
  • 151
  • 195

3 Answers3

3

I think you're looking for a customized way to define precedence; the lists are currently being ordered by ASCII numbering, which defines uppercase letters as always preceding lowercase letters. I would define customized precedence using a dictionary:

>>> test_list = ['a', 'A', 'g', 'G']
>>> precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
>>> test_list.sort(key=lambda x: precedence_dict[x])
>>> test_list
['A', 'a', 'G', 'g']

Edit: Your last few lines:

for genes in genetics:
    if genes.lower().count("a") == genes.lower().count("g") == 2:
        genes = ''.join(genes)
    print Counter(genes)

were not doing what you wanted them to.

Replace those lines with:

precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}

for i in xrange(len(genetics)):
    genetics[i] = list(genetics[i])
    genetics[i].sort(key=lambda x: precedence_dict[x])
    genetics[i] = ''.join(genetics[i])
from sets import Set

genetics = list(Set(genetics))
genetics.sort()

print genetics

and I think you have the correct solution. When iterating over elements in a for loop, Python makes a copy of the item. So the string 'genes' was actually not being modified in the original list.

Nikhil Shinday
  • 1,096
  • 1
  • 9
  • 21
  • Hmm, I tried this, and like abdi's code, it printed "None None None None". I think that I am doing something wrong, but I'm not sure what. Is there any way you could tell me whats wrong ith either what i'm doing or what is wrong? I usually don't ask for help like this, but I need to finish this project _today_ – user5115709 Jul 23 '15 at 15:06
  • is genetics the list of lists? – Nikhil Shinday Jul 23 '15 at 15:36
  • Yes. Genetics is the list of all the different combinations of a's and g's. Each combination is it's own list so yes genetics is the list of lists. Thanks again for looking into this – user5115709 Jul 23 '15 at 15:38
  • Also, it's giving me the error: Traceback (most recent call last): File "Untitled.py", line 73, in test_list.sort(key = lambda x: precedence_dict[x]) File "Untitled.py", line 73, in test_list.sort(key = lambda x: precedence_dict[x]) KeyError: 'aAaA' – user5115709 Jul 23 '15 at 15:44
  • Make sure you're passing in the correct data to the sort function. It should be a list of separate letter values (as in the `test_list` in the answer, not the list of values all concatenated. This works, but it will return `None` if it can't find the element in the passed dict as a key in the precedence list; I assume that is the case. – Matthew R. Jul 23 '15 at 15:45
  • You last few lines: for genes in genetics: if genes.lower().count("a") == genes.lower().count("g") == 2: genes = ''.join(genes) print Counter(genes) – Nikhil Shinday Jul 23 '15 at 15:49
  • So how do I make it so that Python only prints when _both_ specifications ("a"'s == 2 and "g"'s == 2) are met? Because right now if there are two g's then its printing, but I only want it to print when there are two g's and two a's @MatthewR. – user5115709 Jul 23 '15 at 15:53
  • It looks like you handled that with the equality line near the end of your code. I was speaking about passing the correct list, but you accepted the answer so it seems like you figured out what was wrong :) – Matthew R. Jul 23 '15 at 16:54
  • Hi @MatthewR. ! I didn't mean to confuse anyone. The reason that I accepted the answer was because it was the most clear, concise answer posted, and I knew that other people could benefit from it. However, in the edit that Nikhil posted, he deletes the 'for' loop making it so that there needed to be two a's and g's. Unless I'm mistaken, this is what I thought he meant. What can I do to put the for loop back in? – user5115709 Jul 23 '15 at 17:52
  • Whoop, sorry about that. @user5115709, you should look up Python's [filter](https://docs.python.org/2/library/functions.html#filter) function to try and filter out only the entries that you need. It is more efficient to do it on a **shorter** list, so I'd recommend filtering after my script snippet. – Nikhil Shinday Jul 23 '15 at 18:39
  • How would you do that? I am thoroughly confused @NikhilShinday – user5115709 Jul 23 '15 at 19:52
  • You are only including the genes that satisfy the condition that there are two of (A or a) and two of (G or g), so your filter function would look something like: genetics = filter(lambda x: x.lower().count("a") == x.lower().count("g") == 2, genetics) – Nikhil Shinday Jul 23 '15 at 20:27
2

I know you didn't ask for a code review, but you might be better off just generating the strings in the order you want in the first place instead of trying to filter them afterwards. Something like this might work.

def cross(parent1, parent2):

    out = []
    alleles = len(parent1)/2

    # iterate parent 1 possible genotypes
    for i in range(2):

        # iterate loci 
        for k in range(alleles):
            child = []

            # iterate parent 2 possible genotypes
            for j in range(2):
                p1 = parent1[j * 2 + i]
                p2 = parent2[j * 2 + k]
                c = [p1, p2]

                # get each genotype pair into capitalization order
                c.sort()
                c.reverse()
                child += c

            out.append("".join(child))
    return out


if __name__ == "__main__":

    parent1 = 'aaGG'
    parent2 = 'AaGg'

    # F1
    f1 = cross(parent1, parent2)
    print f1

    # F2
    f2 = []
    for p1 in f1:
        for p2 in f1:
            f2 += cross(p1, p2)
    print f2

Here's one way to get all combinations from a single parent. Start with the empty string and add the possibilities one by one.

def get_all_combos(allele_pair, gametes):
# Take a list of of genotypes. Return an updated list with each possibility from an allele pair

    updated_gametes = []
    for z in gametes:
       updated_gametes.append(z + allele_pair[0])
       updated_gametes.append(z + allele_pair[1])
    return updated_gametes

if __name__ == "__main__":

    parent1 = 'aaGG'
    parent2 = 'AaGg'

    alleles = len(parent2)/2
    gametes = [""]
    for a in range(alleles):
        allele_pair = parent2[a*2:a*2+2]
        gametes = get_all_combos(allele_pair, gametes)
    print gametes

Maybe you can figure out how to combine these two solutions to get what you want.

ate50eggs
  • 444
  • 3
  • 14
  • Thank you so so so so so so much. I know, I know, I didn't ask for a code review because I knew that if I did someone would close it. But this helped me so much. Sadly, I can only up vote it. I can't give it the check because it would only help me, and not others. However, what I want is more complicated than what you gave me. For instance, in what I'm doing, there are two steps, and for each F1/F2/F3 and so on, first i need to find all the different combinations of the a's and g's for each person. For example, parent1 (aaGG) would be aG, aG. aG, aG. And then I find all the combinations of – user5115709 Jul 23 '15 at 16:13
  • (continued) those with the other parents combinations. So F1 would really be the cross of aG, aG, aG, aG and AG, Ag, aG, ag. Sorry if that was super confusing. – user5115709 Jul 23 '15 at 16:14
  • I just updated the answer with an example of how to get all the combinations from one parent. – ate50eggs Jul 23 '15 at 17:33
  • I know that this is usually discouraged but I am on a really tight time frame right now to finish this because I have been working on it for the past week and my internship ends today. Could you please explain to me how to put these together because I am really confused – user5115709 Jul 23 '15 at 19:17
  • Check out what winds up in zygotes when you run the second chunk of code. parent2 gives `['AG', 'Ag', 'aG', 'ag']`. Try parent1. I'm not sure what you need to do with those, I still don't totally understand your goal. Here's something that might help: if you have `p2 = ['AG', 'Ag', 'aG', 'ag']`. You can index it like a 2D list. so `p2[0] = 'aG'` and `p2[0][0] = 'A' and p2[0][1] = 'G'`. If you get the zygote lists for both parents, you should be able to get combos in whatever format you want. The first cross AG X aG would be `p1[0][0] + p2[0][0] + p1[1][0] + p2[1][1]` – ate50eggs Jul 23 '15 at 19:56
  • You can get each pair into capitalization order with sort and reverse like I have in the top code chunk. – ate50eggs Jul 23 '15 at 20:01
0

you can try using the sort function. Example of what I mean:

parent1 = "absdksakjcvjvugoh"
parent1sorted = list(parent1)
parent1sorted.sort()
print (parent1sorted)

The result you get is this : ['a', 'a', 'b', 'c', 'd', 'g', 'h', 'j', 'j', 'k', 'k', 'o', 's', 's', 'u', 'v', 'v']

Does this help you?

tldr: Convert string into list, Sort list

  • Hi abdi! I just tried plugging that in, but instead of doing t for the parents, because I think that that would mess up the rest of my code, I plugged it in for genes. i.e: genes = ''.join(genes) geneslist = list.(genes) genessorted = geneslist.sort() It returned "None None None None None..." – user5115709 Jul 23 '15 at 14:59