How can I map a vectorized function to a numpy array without using a for loop?

Question

So here's what I already have:

import numpy as np
import matplotlib.pyplot as plt

def monteCarloPi(n):
    np.random.seed()                 #seed the random number generator
    y = np.random.rand(n)*2 - 1      #n random samples on (-1,1)
    x = np.linspace(-1,1,n)          #x axis to plot against

    square = np.array([x,y])         #collecting axes as a single object

    mask1 = ((x**2 + y**2) < 1)      #filters

    hits = np.sum(mask1)             #calculating approximation
    ratio = hits/n
    pi_approx = ratio * 4

    return pi_approx

Here is what I would like to do:

x = np.arange(100,1000)
y = monteCarloPi(x)
plt.scatter(x,y)

However, when I run the above code block, I get the following error:

---------------------------------------------------------------------
TypeError                           Traceback (most recent call last)
<ipython-input-52-bf4dcedaa309> in <module>()
      1 x = np.arange(100,1000)
----> 2 y = monteCarloPi(x)
      3 plt.scatter(x,y)

    <ipython-input-51-8d5b36e22d4b> in monteCarloPi(n)
      1 def monteCarloPi(n):
      2     np.random.seed()                 #seed the random number generator
----> 3     y = np.random.rand(n)*2 - 1      #n random samples on (-1,1)
      4     x = np.linspace(-1,1,n)          #x axis to plot against
      5 

mtrand.pyx in mtrand.RandomState.rand()

mtrand.pyx in mtrand.RandomState.random_sample()

mtrand.pyx in mtrand.cont0_array()

TypeError: only integer scalar arrays can be converted to a scalar index

Based on my understanding of how broadcasting works in numpy, this should work. I could just use a for loop but that gets really slow really quickly as the number of samples goes up.

halp

`random.rand` only takes scalar values, numbers, the shape of the array it's supposed to generate. You are passing it an array with many values, `x`. The following line with `linspace` will also have problems with this `n`. I don't see what `broadcasting` has to do with this. — hpaulj, Oct 14 '18 at 04:40
okay... So what you're telling me is that the only way to run this simulation for many values of n is with a for loop? — datBoi, Oct 14 '18 at 15:08
I guess the reason I was thinking this was related to broadcasting is that the function outputs a scalar; what I want to happen is to put one array in and get a new one out. — datBoi, Oct 14 '18 at 15:28
The fast compiled numpy code works with rectangular arrays - each row has the same number of columns, and so on for higher dimensions. Broadcasting lets us do math on arrays like this with various combinations of dimensions. Vectorization, in the strict sense, means writing code that can operate with arrays with differing numbers of dimensions. It still loops, but it moves those loops down into compiled blocks of code. — hpaulj, Oct 14 '18 at 16:54
Inside your function, different values of `n` produce different size arrays, `y`, `x`, `square`. The `sum` compresses those back down to a scalar. It can't work with several `n` at once because that would involve `y` etc arrays of different sizes. `y` for n=100 can't be combined with `y` for n=200 into one 2d array. Same for `x`, etc. In `numpy` 'ragged arrays' are not fast. — hpaulj, Oct 14 '18 at 17:00
by the way, your code doesn't use the `square` array. Do you have clear idea as to why you create it? — hpaulj, Oct 14 '18 at 18:03

score 0 · Answer 1 · answered Feb 19 '19 at 18:51

Here is one option, where the maximum sample size is based, then subsampling occurs if start>0 (error handling not included).

import numpy as np
import matplotlib.pyplot as plt

def monteCarloPi(n,start=0,stride=1):
    np.random.seed()                 # seed the random number generator
    y = np.random.rand(n)*2 - 1      # n random samples on (-1,1)
    x = np.linspace(-1,1,n)          # x axis to plot against
    mask = ( x**2 + y**2 ) < 1       # masking
    samples = {}
    inds = arange(n)
    for k in range(n-start,n+1,stride):
        sub_inds = np.random.choice(inds,k,replace=False)
        sub_mask = mask[sub_inds]
        sub_hits = np.sum(sub_mask)
        ratio    = sub_hits/n
        pi_approx = ratio * 4
        samples[k]=pi_approx
    return pi_approx

This still requires a for loop, but it's handled inside the method quickly, since you're subsampling from one large random sample. To recreate your original call (running from n=100 to n=1000 [note that I am going up to n=1000 here]):

estimates = monteCarloPi(1000,start=900)
plt.plot(estimates.keys(),estimates.values())

You could of course pass the original x=arange(100,1001), but then there would need to be error checking in the method (to make sure an array or list was passed), and then n would be equal to the last element of x (n=x[-1]), and finally, the looping would be done over the elements of x (for k in x:).

How can I map a vectorized function to a numpy array without using a for loop?

1 Answers1