I'm working on a binary classification dataset and applying xgBoost model to the problem. Once the model is ready, I plot the feature importance and one of the trees resulting from the underlying random forests. Please find these plots below.
Questions
- If I take a test set of say 10 datapoints, would the importance of features vary from datapoint to datapoint for computation of that datapoints predict_proba score?
- Taking analogy from CNNs class activation map which varies from datapoint to datapoint, does the ordering and relative importance of each feature remain the same when model runs on multiple datapoints or does it vary?