How do I vectorize the following loop in Numpy?

Question

"""Some simulations to predict the future portfolio value based on past distribution. x is 
   a numpy array that contains past returns.The interpolated_returns are the returns 
   generated from the cdf of the past returns to simulate future returns. The portfolio 
   starts with a value of 100. portfolio_value is filled up progressively as 
   the program goes through every loop. The value is multiplied by the returns in that 
   period and a dollar is removed."""

    portfolio_final = []
    for i in range(10000):
        portfolio_value = [100]
        rand_values = np.random.rand(600)
        interpolated_returns = np.interp(rand_values,cdf_values,x)
        interpolated_returns = np.add(interpolated_returns,1)

        for j in range(1,len(interpolated_returns)+1):
            portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
            portfolio_value[j] = portfolio_value[j]-1

        portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))

I couldn't find a way to write this code using numpy. I was having a look at iterations using nditer but I was unable to move ahead with that.

Some questions: Is this supposed to be 10000 independent evaluations of how the portfolio will perform? Because as it is written now your portfolio value array should be 10000 * 600 values long but you only work with the first 600. Or am I misreading your code? Also what is portfolio in the line `portfolio_value.append(interpolated_returns[j-1]*portfolio[j-1])`? Is it a 600 values long array that you initialize once and work with afterwards? Or should it be portfolio_value instead such that you actually have an evolution of your portfolio? — B. Scholz, Jun 23 '16 at 22:31
I just read your comment an edited the code to correct it. Yes, it is 10,000 simulations. I want to create 10,000 sample paths of the portfolio's evolution. It can also be done as a 10,000x600 matrix. You are correct about that line of code. It holds the values of the portfolio as it evolves in a single simulation. It is reinitialized every sample path (This was the mistake in the code). — Akshay Sakariya, Jun 23 '16 at 23:03

B. Scholz · Accepted Answer · 2016-06-24T16:56:53.570

I guess the easiest way to figure out how you can vectorize your stuff would be to look at the equations that govern your evolution and see how your portfolio actually iterates, finding patterns that could be vectorized instead of trying to vectorize the code you already have. You would have noticed that the cumprod actually appears quite often in your iterations.

Nevertheless you can find the semi-vectorized code below. I included your code as well such that you can compare the results. I also included a simple loop version of your code which is much easier to read and translatable into mathematical equations. So if you share this code with somebody else I would definitely use the simple loop option. If you want some fancy-pants vectorizing you can use the vector version. In case you need to keep track of your single steps you can also add an array to the simple loop option and append the pv at every step.

Hope that helps.

Edit: I have not tested anything for speed. That's something you can easily do yourself with timeit.

import numpy as np
from scipy.special import erf

# Prepare simple return model - Normal distributed with mu &sigma = 0.01
x = np.linspace(-10,10,100)
cdf_values = 0.5*(1+erf((x-0.01)/(0.01*np.sqrt(2))))

# Prepare setup such that every code snippet uses the same number of steps
# and the same random numbers
nSteps = 600
nIterations = 1
rnd = np.random.rand(nSteps)

# Your code - Gives the (supposedly) correct results
portfolio_final = []
for i in range(nIterations):
    portfolio_value = [100]
    rand_values = rnd
    interpolated_returns = np.interp(rand_values,cdf_values,x)
    interpolated_returns = np.add(interpolated_returns,1)

    for j in range(1,len(interpolated_returns)+1):
        portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
        portfolio_value[j] = portfolio_value[j]-1

    portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))

# Using vectors
portfolio_final = []
for i in range(nIterations):
    portfolio_values = np.ones(nSteps)*100.0
    rcp = np.cumprod(np.interp(rnd,cdf_values,x) + 1)
    portfolio_values = rcp * (portfolio_values - np.cumsum(1.0/rcp))
    portfolio_final.append(portfolio_values[-1])
print (np.mean(portfolio_final))

# Simple loop
portfolio_final = []
for i in range(nIterations):
    pv = 100
    rets = np.interp(rnd,cdf_values,x) + 1
    for i in range(nSteps):
        pv = pv * rets[i] - 1
    portfolio_final.append(pv)
print (np.mean(portfolio_final))

Wow. This is beautiful. Works like a charm and fast. `timeit` gave me 2.32 secs over five runs. Still need to speed it up and will use cython to do that Thanks a lot! Should've figured the math out. — Akshay Sakariya, Jun 24 '16 at 19:39

score 0 · Answer 2 · answered Jun 23 '16 at 23:32

0

Forget about np.nditer. It does not improve the speed of iterations. Only use if you intend to go one and use the C version (via cython).

I'm puzzled about that inner loop. What is it supposed to be doing special? Why the loop?

In tests with simulated values these 2 blocks of code produce the same thing:

interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
    portfolio_value.append(interpolated_returns[j-1]*portfolio[j-1])
    portfolio_value[j] = portfolio_value[j]-1

interpolated_returns = (interpolated_returns+1)*portfolio - 1
portfolio_value = portfolio_value + interpolated_returns.tolist()

I assuming that interpolated_returns and portfolio are 1d arrays of the same length.

answered Jun 23 '16 at 23:32

hpaulj

221,503
14
230
353

I changed the code. The variable `portfolio` should be `portfolio_value`. – Akshay Sakariya Jun 24 '16 at 01:06
In that case, `np.cumprod` might be useful. – hpaulj Jun 24 '16 at 01:37
Yes, I tried that but the logic will fail. I need to remove that $1 from the portfolio every one of those 600 iterations. Mathematically, it isn't correct. What I need is the next multiplication done after a subtraction. In tandem. But cumprod will do all the multiplication at once without allowing the subtraction of $1. – Akshay Sakariya Jun 24 '16 at 02:08
I'd recommend doing the computation algebraically for a few terms, and looking for patterns that you could exploit. The little I'd suggests that `cumprod` followed by `cumsum` might work (sum partial products). – hpaulj Jun 24 '16 at 18:13

How do I vectorize the following loop in Numpy?

2 Answers2