This is my first post in StackOverflow so hoping i don't break any posting conventions.
I have a dataframe that contains Acct_ID, Current_Balance, Credit_Limit and monthly EADs. Example Table
import pandas as pd
df = pd.DataFrame({'Acct_ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Current_Balance': [8000, 7000, 6000, 8000, 7000, 6000, 3000, 2000, 5000, 5000],
'Credit_Limit': [10000, 9000, 8000, 7000, 6000, 5000, 4000, 3000, 2000, 1000],
'EAD_1': [8500, 7500, 6500, 8500, 7500, 6500, 5500, 4500, 3500, 500],
'EAD_2': [9500, 8500, 7500, 6500, 5500, 4500, 3500, 2500, 1500, 1500],
'EAD_3': [10500, 9500, 8500, 7500, 6500, 5500, 4500, 3500, 2500, 2000],
'EAD_4': [12000, 11000, 10000, 9000, 8000, 7000, 6000, 5000, 4000, 3000]})
I need to apply a cap to the EAD columns that uses the following logic:
If current balance exceed credit limit, then monthly EAD for each observation is the smaller of EAD and current balance. Alternatively, if current balance does not exceed credit limit, then monthly EAD for each observation is the smaller of EAD and credit limit. A floor of 0 is also being applied to the monthly EADs.
My background is in SAS and I have only recently started working with Python so struggling to find an efficient solution. I am including below the SAS equivalent for the outlined logic purely for reference:
if Current_Balance > Credit_limit then do;
EAD_&j. = Max(Min(EAD_&j.,Current_Balance),0);
end;
else do;
EAD_&j. = Max(Min(EAD_&j.,Credit_limit),0);
end;
I already have a working solution in Python but that takes ages to run. The sample i am currently working with has 325k observations and I need to apply the "capping" to 120 columns with monthly EADs. At the moment the below code in Python takes 40 mins to run. I dare not run this on the whole population of 12 million records...Just as a comparison, it takes me 2-3 minutes to do the same in SAS using the script noted above.
def EAD_LT_adjustment(curr_bal, credit_limit, EAD_t):
if curr_bal > credit_limit:
return max(min(EAD_t, curr_bal), 0)
else:
return max(min(EAD_t, credit_limit), 0)
for k in range(13, lifetime + 1):
EL_Acc2['EAD_LT_T' + str(k)] = EL_Acc2.apply(lambda x: EAD_LT_adjustment(x['curr_bal'], x['credit_limit'], x['EAD_LT_T' + str(k)]), axis=1)
Any ideas how to optimise the code? The objective is to get to the same result but in significantly less time.
Thank you.