I am new to random forest classifier. I am using it to classify a dataset that has two classes. - The number of features is 512. - The proportion of the data is 1:4. I.e, 75% of the data is from the first class and 25% of the second one. - I am using 500 trees.
The classifier produces an out of bag error of 21.52%. The per class error for the first class (which is represented by 75% of the training data) is 0.0059. While the classification error for the second class is really high: 0.965.
I am looking for an explanation for this behaviour and if you have suggestion to improve the accuracy for the second class.
I am looking forwards to your help.
Thanks
In forgot to say that I'm using R and that I used nodesize of 1000 in the above test.
Here I repeated the training with only 10 trees and nodesize= 1 (just to give an idea) and below is the function call in R and the confusion matrix:
- randomForest(formula = Label ~ ., data = chData30PixG12, ntree = 10,importance = TRUE, nodesize = 1, keep.forest = FALSE, do.trace = 50)
Type of random forest: classification
Number of trees: 10
No. of variables tried at each split: 22
OOB estimate of error rate: 24.46%
Confusion matrix:
Irrelevant , Relevant , class.error
- Irrelevant 37954 , 4510 , 0.1062076
- Relevant 8775 , 3068 , 0.7409440