0

The RTrees API seems to have changed across different versions. The RTrees 2.4.1 documentation says that it supports both regression and classification, though I don't see how it's possible to do so.

I want to use RTrees as a binary classifier in OpenCV 3.1, though the documentation doesn't say anything about it and RTrees::isClassifier() returns false.

m_pTrees->setMaxDepth(20);
m_pTrees->setMinSampleCount(10);

cv::TermCriteria criteria(cv::TermCriteria::EPS, 0, 0);
m_pTrees->setTermCriteria(criteria);
m_pTrees->setCalculateVarImportance(false);
m_pTrees->setRegressionAccuracy(0);

// I assumed setting categories makes it a classifier.
m_pTrees->setMaxCategories(2);

// Always returns a float (that looks like the average of votes).
// I expected a single 0 or 1 (since max categories is 2).
m_pTrees->predict(sample);

EDIT: I've done some more legwork and looked into the OpenCV source code. RTrees creates a hidden implementation of DTReesImplForRTrees object which extends the DTreesImpl class. This class manages the _isClassifier member variable and sets it according to the response type of the TrainData given to train().

From tree.cpp in OpenCV source code

_isClassifier = data->getResponseType() == VAR_CATEGORICAL;

At the moment, I don't see any method of configuring the TrainData object to return this. Perhaps it's because my training classes are stored as floats instead of integers? If I remember correctly, the data type was required to be CV_32F, but maybe I made an error somewhere.

Joey Carson
  • 2,973
  • 7
  • 36
  • 60

2 Answers2

0

I'll answer my own question since I found it a little confusing and tough to find any obvious documentation on. I only understood that the data needed to be considered categorical by looking at the source code for DTreesImpl.

Though I'm not sure if it will make a major difference. Admittedly I'm very new to ML and OpenCV's implementation of it.

/** @brief Creates training data from in-memory arrays.

@param samples matrix of samples. It should have CV_32F type.
@param layout see ml::SampleTypes.
@param responses matrix of responses. If the responses are scalar, they should be stored as a
    single row or as a single column. The matrix should have type CV_32F or CV_32S (in the
    former case the responses are considered as ordered by default; in the latter case - as
    categorical)
 */
CV_WRAP static Ptr<TrainData> create(InputArray samples, int layout, InputArray responses,
                             InputArray varIdx=noArray(), InputArray sampleIdx=noArray(),
                             InputArray sampleWeights=noArray(), InputArray varType=noArray());
Joey Carson
  • 2,973
  • 7
  • 36
  • 60
0

Checkout the example ~/opencv/samples/cpp/letter_recog.cpp It's an example using RTrees for 26 classes (letters). To use it for binary class data, you just need to feed data with 2 class labels (responses in the code)

Steven
  • 400
  • 3
  • 11