I am using Random Forest in Python to classify my data into 6 classes. My data are X,Y,Z coordinates, some geometric features and labels. I am using the geometric features and the labels to train the classifier using the training set (a random 70% of my data). I would like though to use also a probability threshold of let's say 50% on the test set (the rest 30% of the data), so the data predicted with less than 50% probability are assigned to a class 6 which represents unknown and all the rest from 0 to 5 as normally. However, the output predicted labels I would like to be in the same order as in my test set so I can then easily associate the predicted labels to the XYZ coordinates for visualization purposes. How could I implement that in Python?
Asked
Active
Viewed 4,054 times
1 Answers
1
If I understand you right; if none of your five classes have a probabillity>0.5 you would assign the input as the sixth class named unknown
?
You can use the predict_proba
method for a RandomForest (RF) class. It gives a "probabillity" for all of your classes e.g for one sample
pred = RF.predict_proba(X_test)
#pred
#[[0.2,0.3,0.1,0.40],
#[0.8,0.1,0.05,0.05,0]]
we would assign the first as unknown
(class 6) and the second as class 1.
You can then use it on your entire test set
pred = RF.predict_proba(X_test)
classes = [6 if sum(p<0.5)==5 else np.argmax(p)+1 for p in pred]

CutePoison
- 4,679
- 5
- 28
- 63