I have the below code, using which I can calculate the volume-weighted average price by three lines of Pandas code.
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
import datetime as dt
df = DataReader(['AAPL'], 'yahoo', dt.datetime(2013, 12, 30), dt.datetime(2014, 12, 30))
df['Cum_Vol'] = df['Volume'].cumsum()
df['Cum_Vol_Price'] = (df['Volume'] * (df['High'] + df['Low'] + df['Close'] ) /3).cumsum()
df['VWAP'] = df['Cum_Vol_Price'] / df['Cum_Vol']
I am trying to find a way to code this without using cumsum()
as an exercise. I am trying to find a solution which gives the VWAP
column in one pass. I have tried the below line, using .apply()
. The logic is there, but the issue is I am not able to store values in row n in order to use in row (n+1). How do you approach this in pandas
- just use an external tuplet or dictionary for temporary storage of cumulative values?
df['Cum_Vol']= np.nan
df['Cum_Vol_Price'] = np.nan
# calculate running cumulatives by apply - assume df row index is 0 to N
df['Cum_Vol'] = df.apply(lambda x: df.iloc[x.name-1]['Cum_Vol'] + x['Volume'] if int(x.name)>0 else x['Volume'], axis=1)
Is there a one-pass solution to the above problem?
EDIT:
My main motivation is to understand what is happening under the hood. So, it is mainly for exercise than any valid reason. I believe each cumsum on a Series of size N has time complexity N (?). So I was wondering, instead of running two separate cumsum's, can we calculate both in one pass - along the lines of this. Very happy to accept an answer to this - rather than working code.