Systematic error in Python stochastic simulation

Question

I want to simulate a simple birth death process using the Gillespie algorithm (https://en.wikipedia.org/wiki/Gillespie_algorithm) in python.

At each instant, there is a probability a of birth and a probability b of death per indiviudal. I believe the following code should simulate the dynamics of the population n:

import numpy as np 
a = 10.0 # birth rate
b = 1.0 # death rate
n = 0.0 # initial population
t = 0.0 # initial time
T = [] # times of births or deaths
N = [] # population timeseries
for _ in range(1000000): # do 1000000 iterations
    P = [a, b*n] # rate of birth and death
    P0 = sum(P) # total rate
    # step the time
    t = t + (1/P0)*np.log(1/np.random.random())
    # select the transition
    R = P[0]/P0 # probability of birth
    r = np.random.random() # choose a random number
    # enact the transition
    if r<R: # birth
        n = n+1
    elif r>=R: # death 
        n = n-1
    N.append(n) # update the output lists
    T.append(t)

This has the intended behavior: plotting N versus T shows a random population with well-defined statistical patterns. However, I have a serious source of confusion.

The analytical solution of this model says the mean value of N should be a/b, while this simulation consistently overshoots -- this error is systematic and is preserved for any choice of a and b. Increasing the number of iterations does not reduce this systematic error. I compute it as

(sum(N)/len(N)-a/b)/(a/b)*100 # error in the mean value in percent

which always returns at least 10%.

What am I missing here? What is the source of this systematic error in the mean of my simulation? Am I misinterpreting np.random somehow? There has to be an issue with my code as the error should scale as 1/sqrt(# iterations), otherwise.

I'm pretty sure `sum(N)/len(N)` is not correct because you are neglecting the varying timesteps. Whenever the population is above its expected value timesteps tend to be smaller, so you oversample, whenever the population is below its expected value timesteps tend to be larger, so you undersample. — Paul Panzer, Dec 22 '19 at 09:05

Ry- · Accepted Answer · 2019-12-22T07:25:23.740

Do you need to take time into account? I’m not really familiar with this stuff but this simplified version at least gives results that get closer to zero with time.

import random

TIME = 100000.0

a = 10.0  # birth rate
b = 1.0   # death rate
n = 0     # initial population
t = 0.0   # initial time

t_average_pop = 0.0

while True:
    P0 = a + b * n

    # step the time
    step = random.expovariate(P0)
    t += step
    t_average_pop += n * step

    if t >= TIME:
        break

    # select the transition
    if random.uniform(0, P0) < a:
        # birth
        n += 1
    else:
        # death
        n -= 1

average_pop = t_average_pop / t
expected = a / b

print((average_pop - expected) / expected * 100)

This is the case-- Thank you Ry! This was actually a toy problem for something that matters to my research, so I appreciate the help very much. — kevinkayaks, Dec 22 '19 at 12:59

score 1 · Answer 2 · answered Dec 22 '19 at 09:18

1

The error lies with how you are checking your result, specifically the term sum(N)/len(N).

You have to integrate over time:

from scipy.integrate import trapz

theo = a/b
obs = trapz(N,T)/T[-1]
(obs-theo)/theo
# 0.0029365091018275892

answered Dec 22 '19 at 09:18

Paul Panzer

51,835
3
54
99

big oversight-- thank you! I see it was a conceptual problem rather than a coding problem :) Stack comes through all the same – kevinkayaks Dec 22 '19 at 12:54

Systematic error in Python stochastic simulation

2 Answers2