As a rule, you want to rule out using lists when you are working with numpy. Arrays are much more efficient in terms of both time and space. You should also try to avoid using explicit Python loops, since the vectorization provided by numpy is going to be much faster.
Let's say you wanted to do this for Z
different walks, where Z = 5000
in your sample, but you want it to be bigger in the general case. You can use the fact that most numpy functions can be applied to a particular axis to accomplish this:
import numpy as np
from matplotlib import pyplot as plt
Z = 5000
N = 5000
d = 2
all_walks = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=(Z, N)) - 1, axis=1)
Setting the size of the steps to (Z, N)
will give you a matrix containing Z
rows, each row with N
steps. Taking the cumulative sum with axis=1
means summing across the columns. Now each row is an independent walk. You can use very basic slicing to get any column you want. The n
th column will contain the n
th step from each of the Z
walks. The reason that you want to make your slices columns is that plotting becomes much easier that way.
Let's take a look at n=1000
:
n1k = all_walks[:, 1000]
plt.hist(n1k, bins=20, histtype='step', density=True)
So far so good. To take every 1000 samples, use the step size in the index:
n = 1000
samples = all_walks[:, n::n]
samples
is now Zx(N//n)-1 ((5000, 4)
) array containing the steps at indices 1000, 3000, 4000
in each array. You seem to want to have five total samples in this example. There are three reasonable ways to do that, in my opinion:
- Start at index
0
: samples = all_walks[:, ::n]
- Start at index
n-1
: samples = all_walks[:, n-1::n]
Add an extra step (column) to your walks:
all_walks = ... size=(Z, N+1) ...
samples = all_walks[:, n::n]
I don't think that messing with the spacing by inconsistently adding index -1
is a particularly good idea. I will use option #2 instead (remember that index 999 contains the 1000th step).
The good news is that matplotlib allows you to plot the columns of a vector all at once:
plt.hist(samples, bins=20, histtype='step', density=True)
So the whole script is actually quite short:
import numpy as np
from matplotlib import pyplot as plt
Z = 5000
N = 5000
d = 2
n = 1000
all_walks = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=(Z, N)) - 1, axis=1)
samples = all_walks[:, n-1::n]
plt.hist(samples, bins=20, histtype='step', density=True)
plt.show()
If for some reason you could not hold an array of size Z * N
floats in memory all at once, you could implement the line all_walks
using a loop that generates one walk at a time, somewhat like you originally tried to do. Remember, this is only if you make Z
some stupendously large number or if you have no RAM for some reason:
samples = np.empty((Z, N//n))
for row in range(Z):
walk = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=N) - 1)
samples[row] = walk[n-1::n]
If you start with the same random seed, both methods should give you identical results. The main difference will be that the first takes more memory, while the second takes more time.