1

While evaluating xgboost model performance, I find that transaction_id column which is just a column of numbers from 1 to length of dataframe has a higher importance than the rest of the columns. I also have random values column which has a zero feature importance. Does splitting the dataframe without removing this column result in data leakage while random train-test splitting? There are multiple transaction_ids for a single person in the dataframe.

0 Answers0