I am currently working on a project. I already selected my features and want to check their importance. I have some questions if anyone can help me please.
1- Does it make sense if I use RandomForestClassifier
with cross-validation
to calculate the feature importance?
2- I tried it to calculate the feature Importance using the cross_validate
function
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html . The function provides the test_score and train_score results. The results I got with a 10 Fold cross-validation
were as follows:
test_score [0.99950158, 0.9997231 , 0.9997231 , 0.99994462, 0.99977848, 0.99983386, 0.99977848, 0.9997231 , 0.99977847, 1.]
train_score [0.99998769, 0.99998154, 0.99997539, 0.99997539, 0.99998154,0.99997539, 0.99998154, 0.99997539, 0.99998154, 0.99997539]
,
Can anyone explain these results? And what does it indicate?
3- The cross_validate
function has a parameter called scoring
, which has different scoring values such as accuracy
, balanced_accuracy
and f1
. What does the scoring
parameter do? And what do these values mean? And how should I decide which one to choose? I already read the scikit-learn documentation but wasn't clear to me.
Thank you.