1

I'm plotting some random walk functions on python and trying to create it so it takes the locations of the random walks every 1000 steps and then plots them in a histogram. I realise that I could literally create a new list for n=1000,2000 etc. each time and just append the values of the walk to that list, but is there a way I can get python to create a new list every 1000 steps?

import numpy as np 
import matplotlib.pyplot as plt

def random_walk(N,d):       
    walk = d*np.cumsum(2*np.random.binomial(1,.6,N)-1)
    return walk
n1=[]
n2=[]
n3=[]
n4=[]
n5=[]

for k in range(5000):
    particular_walk = random_walk(5000,2)
    n1.append(particular_walk[1000])
    n2.append(particular_walk[2000])
    n3.append(particular_walk[3000])
    n4.append(particular_walk[4000])
    n5.append(particular_walk[-1])


plt.hist(n1,bins=20,histtype='step',density=True)
plt.hist(n2,bins=20,histtype='step',density=True)
plt.hist(n3,bins=20,histtype='step',density=True)
plt.hist(n4,bins=20,histtype='step',density=True)
plt.hist(n5,bins=20,histtype='step',density=True)
plt.show()

This is the code I have so far, but I realise that it doesn't work. I know I could just have a list called say, "midpoint", and set this to the locations of the particular walk at 2500, but is there a way to do it automatically?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
T. L
  • 103
  • 2
  • 8
  • What do you want the output to be, exactly? – Mad Physicist Oct 26 '18 at 12:38
  • I want multiple lists, each with all the values of the random walk at that point. And then I want it to plot the probabilities for all of these lists in the histogram. – T. L Oct 26 '18 at 12:45
  • Please show a sample output and the plot you expect to see – Mad Physicist Oct 26 '18 at 13:06
  • Specifically, are you trying to get 5 different histograms plotted or just one in the example above? – Mad Physicist Oct 26 '18 at 13:10
  • 5 histograms, I know how to do this manually - but say, if n changed to 10000 instead of 5000, is there a way I could get it to automatically produce 10 histograms in that case? On my phone right now but will show a sample in a bit. – T. L Oct 26 '18 at 13:41
  • Show the manual steps that work for you, and I'll show you how to automate them. – Mad Physicist Oct 26 '18 at 15:57
  • If I create a list, 'n1=[]' and then when running the for loop ' n1.append(particular_walk[1000])', then 'plt.hist(n1, bins=20, histtype='step', density=True)'. I do this for each intermediate point, so for n=2000, 3000, 4000, 5000 etc. – T. L Oct 26 '18 at 16:07
  • Are you missing a colon in there? Can you edit the question with all relevant samples and info please? – Mad Physicist Oct 26 '18 at 16:20
  • Changed it now, thank you so much for helping! – T. L Oct 26 '18 at 16:30
  • Last question: you say "but I realise that it doesn't work". Why? `particular_walk` is a 1D array. `particular_walk[1000]` is a scalar. That means that `n1` has the 1001st element of 5000 random walks. The histograms look fine. What doesn't work? – Mad Physicist Oct 26 '18 at 17:31

2 Answers2

0

Maybe you could use python dictionaries and create the lists inside the dictionaries instead.

It could be something like this:

if k%1000 == 0:
    rw_dict[str(k/1000)]=[]

Where str(k/1000) is just the name of the key inside the dictionary: 1.0, 2.0, etc. Whenever you need to do operations on that specific list, you can access the list with rw_dict[list_name].

I hope this helps!

Sergio
  • 205
  • 1
  • 11
0

As a rule, you want to rule out using lists when you are working with numpy. Arrays are much more efficient in terms of both time and space. You should also try to avoid using explicit Python loops, since the vectorization provided by numpy is going to be much faster.

Let's say you wanted to do this for Z different walks, where Z = 5000 in your sample, but you want it to be bigger in the general case. You can use the fact that most numpy functions can be applied to a particular axis to accomplish this:

import numpy as np 
from matplotlib import pyplot as plt

Z = 5000
N = 5000
d = 2

all_walks = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=(Z, N)) - 1, axis=1)

Setting the size of the steps to (Z, N) will give you a matrix containing Z rows, each row with N steps. Taking the cumulative sum with axis=1 means summing across the columns. Now each row is an independent walk. You can use very basic slicing to get any column you want. The nth column will contain the nth step from each of the Z walks. The reason that you want to make your slices columns is that plotting becomes much easier that way.

Let's take a look at n=1000:

n1k = all_walks[:, 1000]
plt.hist(n1k, bins=20, histtype='step', density=True)

So far so good. To take every 1000 samples, use the step size in the index:

n = 1000
samples = all_walks[:, n::n]

samples is now Zx(N//n)-1 ((5000, 4)) array containing the steps at indices 1000, 3000, 4000 in each array. You seem to want to have five total samples in this example. There are three reasonable ways to do that, in my opinion:

  1. Start at index 0: samples = all_walks[:, ::n]
  2. Start at index n-1: samples = all_walks[:, n-1::n]
  3. Add an extra step (column) to your walks:

    all_walks = ... size=(Z, N+1) ...
    samples = all_walks[:, n::n]
    

I don't think that messing with the spacing by inconsistently adding index -1 is a particularly good idea. I will use option #2 instead (remember that index 999 contains the 1000th step).

The good news is that matplotlib allows you to plot the columns of a vector all at once:

plt.hist(samples, bins=20, histtype='step', density=True)

So the whole script is actually quite short:

import numpy as np 
from matplotlib import pyplot as plt

Z = 5000
N = 5000
d = 2
n = 1000

all_walks = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=(Z, N)) - 1, axis=1)
samples = all_walks[:, n-1::n]
plt.hist(samples, bins=20, histtype='step', density=True)
plt.show()

If for some reason you could not hold an array of size Z * N floats in memory all at once, you could implement the line all_walks using a loop that generates one walk at a time, somewhat like you originally tried to do. Remember, this is only if you make Z some stupendously large number or if you have no RAM for some reason:

samples = np.empty((Z, N//n))
for row in range(Z):
    walk = d * np.cumsum(2 * np.random.binomial(1, 0.6, size=N) - 1)
    samples[row] = walk[n-1::n]

If you start with the same random seed, both methods should give you identical results. The main difference will be that the first takes more memory, while the second takes more time.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264