Index(or index-like) variable has a higher feature importance than rest of the variables?

Asked Jun 12 '22 at 15:13

Active Jun 12 '22 at 15:13

Viewed 121 times

While evaluating xgboost model performance, I find that transaction_id column which is just a column of numbers from 1 to length of dataframe has a higher importance than the rest of the columns. I also have random values column which has a zero feature importance. Does splitting the dataframe without removing this column result in data leakage while random train-test splitting? There are multiple transaction_ids for a single person in the dataframe.

asked Jun 12 '22 at 15:13

Divyanshu Chauhan

What do you think could be the answer? – AlexK Jun 13 '22 at 04:31
I'm not sure, I also think this could also mean as the number of transaction increase(with the index-like variable) the chances of fraud increases. – Divyanshu Chauhan Jun 15 '22 at 15:37

Index(or index-like) variable has a higher feature importance than rest of the variables?

0 Answers0