I have a dataframe in Python below:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
df.head()
I want to convert those values (string value: C, F, and Z) into some categories with this condition: if values in column CRDACCT_DLQ_CYC_1_MNTH_AGO, CRDACCT_DLQ_CYC_2_MNTH_AGO, ......., CRDACCT_DLQ_CYC_12_MNTH_AGO consist:
C = 0
F = 0
Z = 0
else value = value
#Convert value
df = df.replace({'C': 0, 'F': 0, 'Z': 0,' ':0}).astype(int)
Then, I want to create a new column with the name of MSD. MSD stands for Month Since Delinquent. MSD is calculated by identifying each of 12 columns CRDACCT_DLQ_CYC_1_MNTH_AGO, CRDACCT_DLQ_CYC_2_MNTH_AGO, .......up until CRDACCT_DLQ_CYC_12_MNTH_AGO with this kind of condition:
If value in CRDACCT_DLQ_CYC_1_MNTH_AGO > 1 then MSD = 1, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_2_MNTH_AGO > 1 then MSD = 2, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_3_MNTH_AGO > 1 then MSD = 3, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_4_MNTH_AGO > 1 then MSD = 4, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_5_MNTH_AGO > 1 then MSD = 5, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_6_MNTH_AGO > 1 then MSD = 6, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_7_MNTH_AGO > 1 then MSD = 7, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_8_MNTH_AGO > 1 then MSD = 8, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_9_MNTH_AGO > 1 then MSD = 9, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_10_MNTH_AGO > 1 then MSD = 10, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_11_MNTH_AGO > 1 then MSD = 11, otherwise MSD=0 or
If value in CRDACCT_DLQ_CYC_12_MNTH_AGO > 1 then MSD = 12, otherwise MSD=0
Note: otherwise if value 1 and 0, then MSD = 0.
index 0, MSD =1,because value 3 > 1 is in CRDACCT_DLQ_CYC_1_MNTH_AGO (we no need to check CRDACCT_DLQ_CYC_2_MNTH_AGO > 1 because we have found month since delinquent in CRDACCT_DLQ_CYC_1_MNTH_AGO) , hence MSD is in 1 MNTH AGO
index 1, MSD=1,
index 2, MSD=2,
index 3, MSD=2, because value 3 > 1 is in CRDACCT_DLQ_CYC_2_MNTH_AGO, hence MSD is in 2 MNTH AGO
index 4, MSD=2
Note: by checking each 12 columns with those conditions, If all values = 0 in each column CRDACCT_DLQ_CYC_1_MNTH_AGO, .....and CRDACCT_DLQ_CYC_12_MNTH_AGO, then MSD should be = 0.
Generally it is to check value > 1 in each 12 columns then determine the MSD value based on column name CRDACCT_DLQ_CYC_x_MNTH_AGO, x will be the value of MSD if > 1.