I'm trying to perform a Cross-Validation over Random Forest in Spark 1.6.0 and I'm finding hard to obtain the evaluation metrics (precision, recall, f1...). I want the average of the metrics of all folds. Is this possible to obtain them with CrossValidator
and MulticlassClassificationEvaluator
?
I only found examples where the evaluation is performed later over an independent test dataset and using the best model from the Cross-Validation. I'm not planning to use a train and test set, but to use all the dataframe (df) for the cross validation, let it make the splits, and then take the average metrics.
paramGrid = ParamGridBuilder().build()
evaluator = MulticlassClassificationEvaluator()
crossval = CrossValidator(
estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=evaluator,
numFolds=5)
model = crossval.fit(df)
evaluator.evaluate(model.transform(df))
For now, I obtain the best model metric with the last line of the above code evaluator.evaluate(model.transform(df))
and I'm not totally sure that I'm doing it correctly.