Before concatenating both Dataframes have 7841 rows but after concatenation, the number of rows suddenly increased to 9005.
trans_features=['Customer_Age',
'Dependent_count',
'Contacts_Count_12_mon',
'Months_Inactive_12_mon',
'Credit_Limit',
'Total_Revolving_Bal',
'Total_Amt_Chng_Q4_Q1',
'Total_Trans_Amt',
'Total_Ct_Chng_Q4_Q1',
'Avg_Utilization_Ratio']
df_trans_feat = df_bank[trans_features]
pt = PowerTransformer() # By default it's yeo-johnson transformation
transformed = pt.fit_transform(df_trans_feat)
df_transformed = pd.DataFrame(transformed,
columns=df_trans_feat.columns)
print("df_transformed:", df_transformed.shape)
df_bank.drop(df_trans_feat, axis=1, inplace=True)
print("df_bank:", df_bank.shape)
df_bank_comb = pd.concat([df_bank, df_transformed], axis=1)
print("df_bank_comb:", df_bank_comb.shape)
Output:
df_transformed: (7841, 10)
df_bank: (7841, 7)
df_bank_comb: (9005, 17)
The increase in number of rows is baffling. My intention is to combine the two dataframes horizontally. Is there any problem in my concat statement?