What should be better path to convert a scikit model (e.g. the result of a RandomForestClassifier fit) in a piece of C++ to get the the fastest .so
that can be called from some other ecosystem ?
Asked
Active
Viewed 1,410 times
6

user3313834
- 7,327
- 12
- 56
- 99
-
I don't understand what you mean by "convert". `RandomForestClassifier` is [implemented in Python](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L372-L628), not as a compiled extension. If you want a fast compiled version you will need to re-write it. One option would be to use [Cython](http://cython.org/) which can convert a superset of Python to C so that it can be statically compiled. There's no zero-effort solution, though - you will need to add your own static type declarations etc. in order to see any appreciable speed-up. – ali_m Jul 19 '16 at 00:22
-
1Yes, I agree, but to create/train/fit the model and use it to do a prediction are two different thinks. scikit RandomForestClassifier will create the model, usually stored in a pickle as tree collection. In a High Speed RTB context I need to "use/accelerate" this resulting model, converting only the final "Tree soup" in some C++ code to apply it faster. I've seen PMML http://stackoverflow.com/questions/38431113/convert-a-pmml-describe-model-in-c-c but does not seem to help much for our usecase. – user3313834 Jul 19 '16 at 10:45
1 Answers
2
For portability of trained scikit learn models to other languages, see the sklearn-porter project.
Though, whether this will be faster than the originalRandomForestClassifier.predict
method (which is multithreaded and uses numpy operations, potentially with a fast BLAS library) remains to be seen.

rth
- 10,680
- 7
- 53
- 77