1

I have defined this function:

def RCP(row): 
    ### This function is what we use to predict the total number of purchases our customers will make over the 
    ### remainder of their lifetime as a customer. For each row in the dataframe, we iterate on the library's 
    ### built-in `conditional_expected_number_of_purchases_to_time` increasing t until the incremental RCP is below a 
    ### certain threshold.
    
    init_pur = 0 # start the loop at this value
    current_pur = 0 # the value of this variable updates after each loop
    t = 1 # time 
    eps_tol=1e-6 # threshold for ending the loop
    
    while True:
        ## here we calculate the incremental number of purchases between n and n-1, which gets added to the previous value of the variable
        current_pur += (mbgf.conditional_expected_number_of_purchases_up_to_time(t, row['frequency'], row['recency'], row['T']) -
                       mbgf.conditional_expected_number_of_purchases_up_to_time((t-1), row['frequency'], row['recency'], row['T']))    
        # if the difference between the most recent loop and the prior loop is less than the threshold, stop the loop
        if (current_pur - init_pur < eps_tol): 
            break
        init_pur = current_pur #reset the starting loop value
        t += 1 # increment the time period by 1
    return current_pur

What I am trying to do is run this function on each row in my dataframe until the difference between the current value and the previous value is less than my threshold (defined here by eps_tol), then move on to the next

It is working as expected, but the problem is that it is taking forever to run on dataframes of any meaningful size. I am currently working with a dataframe comprised of 40k rows and in some cases will have dataframes with more than 100k rows.

Can anyone recommend to me how I might be able to tweak this function - or re-write it - so that it runs faster?

  • You can get an easy factor of 2 by caching the value of the call into the mbgf object's method so you call it only once per iteration. Beyond that, how does the performance of conditional_expected_number_of_purchases_up_to_time() vary with the `t` argument, and what are the typical values of `t` for which your function is converging? – constantstranger Feb 13 '22 at 22:25

0 Answers0