I'm trying to evaluate the probabilities of end locations of random walks but I'm having some trouble with the speed of my program. Basically what I'm trying to do is take as an input a dictionary that contains the probabilities for a random walk( e.g. p = {0:0.5, 1:0.2. -1:0.3} meaning there's a 50% probability X stays at 0, a 20% probability X increases by 1, and a 30% probability X decreases by 1) and then calculate the probabilities for all the possible future states after n iterations.
So for example if p = {0:0.5, 1:0.2. -1:0.3} and n = 2 then it will return {0:0.37, 1:0.2, -1:0.3, 2:0.04, -2:0.09} if p = {0:0.5, 1:0.2. -1:0.3} and n = 1 then it will return {0:0.5, 1:0.2. -1:0.3}
I have working code, and it runs relatively quickly if n is low and if the p dictionary is small, but when n > 500 and the dictionary has around 50 values it takes upwards of 5 minutes to calculate. I'm guessing this is because it does it only on one processor so I went ahead and modified it so it would use python's multiprocessing module (as I read that multithreading doesn't improve parallel computing performance because of GIL).
My problem is, that there is not much improvement with multiprocessing, now I'm not sure if it's because I'm implementing it wrong or because of the overhead of multiprocessing in python. I'm just wondering if there's a library somewhere that evaluates all the probabilities of all the possibilities of a random walk when n > 500 in parallel? My next step if I can't find anything is to write my own function as an extension in C but it will be my first time doing it and although I've coded in C before it has been a while.
Original Non MultiProcessed Code
def random_walk_predictor(probabilities_tree, period):
ret = probabilities_tree
probabilities_leaves = ret.copy()
for x in range(period):
tmp = {}
for leaf in ret.keys():
for tree_leaf in probabilities_leaves.keys():
try:
tmp[leaf + tree_leaf] = (ret[leaf] * probabilities_leaves[tree_leaf]) + tmp[leaf + tree_leaf]
except:
tmp[leaf + tree_leaf] = ret[leaf] * probabilities_leaves[tree_leaf]
ret = tmp
return ret
MultiProcessed code
from multiprocessing import Manager,Pool
from functools import partial
def probability_calculator(origin, probability, outp, reference):
for leaf in probability.keys():
try:
outp[origin + leaf] = outp[origin + leaf] + (reference[origin] * probability[leaf])
except KeyError:
outp[origin + leaf] = reference[origin] * probability[leaf]
def random_walk_predictor(probabilities_leaves, period):
probabilities_leaves = tree_developer(probabilities_leaves)
manager = Manager()
prob_leaves = manager.dict(probabilities_leaves)
ret = manager.dict({0:1})
p = Pool()
for x in range(period):
out = manager.dict()
partial_probability_calculator = partial(probability_calculator, probability = prob_leaves, outp = out, reference = ret.copy())
p.map(partial_probability_calculator, ret.keys())
ret = out
return ret.copy()