6

What should be better path to convert a scikit model (e.g. the result of a RandomForestClassifier fit) in a piece of C++ to get the the fastest .so that can be called from some other ecosystem ?

user3313834
  • 7,327
  • 12
  • 56
  • 99
  • I don't understand what you mean by "convert". `RandomForestClassifier` is [implemented in Python](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L372-L628), not as a compiled extension. If you want a fast compiled version you will need to re-write it. One option would be to use [Cython](http://cython.org/) which can convert a superset of Python to C so that it can be statically compiled. There's no zero-effort solution, though - you will need to add your own static type declarations etc. in order to see any appreciable speed-up. – ali_m Jul 19 '16 at 00:22
  • 1
    Yes, I agree, but to create/train/fit the model and use it to do a prediction are two different thinks. scikit RandomForestClassifier will create the model, usually stored in a pickle as tree collection. In a High Speed RTB context I need to "use/accelerate" this resulting model, converting only the final "Tree soup" in some C++ code to apply it faster. I've seen PMML http://stackoverflow.com/questions/38431113/convert-a-pmml-describe-model-in-c-c but does not seem to help much for our usecase. – user3313834 Jul 19 '16 at 10:45

1 Answers1

2

For portability of trained scikit learn models to other languages, see the sklearn-porter project.

Though, whether this will be faster than the originalRandomForestClassifier.predict method (which is multithreaded and uses numpy operations, potentially with a fast BLAS library) remains to be seen.

rth
  • 10,680
  • 7
  • 53
  • 77