As preparation for an upcoming bioinformatics course, I am doing some assignments from rosalind.info. I am currently stuck in the assignment "Mendel's First Law".
I think I could brute force myself through this, but that somehow my thinking must be too convoluted. My approach would be this:
Build a tree of probabilities which has three levels. There are two creatures that mate, creature A and creature B. First level is, what is the probability for picking as creature A homozygous dominant (k), heterozygous (m) or homozygous recessive (n). It seems that for example for homozygous dominant, since there are a total of (k+m+n) creatures and k of them are homozygous dominant, the probability is k/(k+m+n).
Then in this tree, under each of these would come the probability of creature B being k / m / n given that we know what creature A got picked as. For example if creature A was picked to be heterozygous (m), then the probability that creature B would also be heterozygous is (m-1)/(k+m+n-1) because there is now one less heterozygous creature left.
This would give the two levels of probabilities, and would involve a lot of code just to get this far, as I would literally be building a tree structure and for each branch have manually written code for that part.
Now after choosing creatures A and B, each of them has two chromosomes. One of these chromosomes can randomly be picked. So for A chromosome 1 or 2 can be picked and same for B. So there are 4 different options: pick 1 of A, 1 of B. Pick 2 of A, 1 of B. Pick 1 of A, 2 of B. Pick 2 of A, 2 of B. The probability of each of these would be 1/4. So finally this tree would have these leaf probabilities.
Then from there somehow by magic I would add up all of these probabilities to see what is the probability that two organisms would produce a creature with a dominant allele.
I doubt that this assignment was designed to take hours to solve. What am I thinking too hard?
Update:
Solved this in the most ridiculous brute-force way possible. Just ran thousands of simulated matings and figured out the portion that ended up having a dominant allele, until there was enough precision to pass the assignment.
import random
k = 26
m = 18
n = 25
trials = 0
dominants = 0
while True:
s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
first = random.choice(s)
s.remove(first)
second = random.choice(s)
has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
trials += 1
if has_dominant_allele:
dominants += 1
print "%.5f" % (dominants / float(trials))