1

Is it possible to choose the combining strategy for MLlib's random forests? I can't find any clue on the official API docs.

Here's my code:

val numClasses = 10
val categoricalFeaturesInfo = Map[Int, Int]()
val numTrees = 10 
val featureSubsetStrategy = "auto" 
val impurity = "entropy"
val maxDepth = 2
val maxBins = 320

val model = RandomForest.trainClassifier(trainData, numClasses, categoricalFeaturesInfo,
  numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)

val predictionAndLabels = testData.map { case LabeledPoint(label, features) =>
  val prediction = model.predict(features)
  (prediction, label)
}

I know that the predict method (implemented in treeEnsembleModels class) take in account the combining strategy (Sum, Average or Vote):

def predict(features: Vector): Double = {
    (algo, combiningStrategy) match {
      case (Regression, Sum) =>
        predictBySumming(features)
      case (Regression, Average) =>
        predictBySumming(features) / sumWeights
      case (Classification, Sum) => // binary classification
        val prediction = predictBySumming(features)
        // TODO: predicted labels are +1 or -1 for GBT. Need a better way to store this info.
        if (prediction > 0.0) 1.0 else 0.0
      case (Classification, Vote) =>
        predictByVoting(features)
      case _ =>
        throw new IllegalArgumentException(
          "TreeEnsembleModel given unsupported (algo, combiningStrategy) combination: " +
        s"($algo, $combiningStrategy).")
    }
}
Franjrg
  • 100
  • 1
  • 11

1 Answers1

0

I'd say the only way it's possible to do is to use reflection after the model's been built. That have to be possible, because field usage is deferred (I haven't tried to run this code, but smth like this would work).

RandomForestModel model = ...;
Class<?> c = model.getClass();
Field strategy = c.getDeclaredField("combiningStrategy");
strategy.set(model, whatever);
evgenii
  • 1,190
  • 1
  • 8
  • 21