I have defined this function:
def RCP(row):
### This function is what we use to predict the total number of purchases our customers will make over the
### remainder of their lifetime as a customer. For each row in the dataframe, we iterate on the library's
### built-in `conditional_expected_number_of_purchases_to_time` increasing t until the incremental RCP is below a
### certain threshold.
init_pur = 0 # start the loop at this value
current_pur = 0 # the value of this variable updates after each loop
t = 1 # time
eps_tol=1e-6 # threshold for ending the loop
while True:
## here we calculate the incremental number of purchases between n and n-1, which gets added to the previous value of the variable
current_pur += (mbgf.conditional_expected_number_of_purchases_up_to_time(t, row['frequency'], row['recency'], row['T']) -
mbgf.conditional_expected_number_of_purchases_up_to_time((t-1), row['frequency'], row['recency'], row['T']))
# if the difference between the most recent loop and the prior loop is less than the threshold, stop the loop
if (current_pur - init_pur < eps_tol):
break
init_pur = current_pur #reset the starting loop value
t += 1 # increment the time period by 1
return current_pur
What I am trying to do is run this function on each row in my dataframe until the difference between the current value and the previous value is less than my threshold (defined here by eps_tol
), then move on to the next
It is working as expected, but the problem is that it is taking forever to run on dataframes of any meaningful size. I am currently working with a dataframe comprised of 40k rows and in some cases will have dataframes with more than 100k rows.
Can anyone recommend to me how I might be able to tweak this function - or re-write it - so that it runs faster?