9

Suppose I have successfully trained a XGBoost machine learning model in python.

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)
model = XGBClassifier()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

I want to port this model to another system which will be writte in C/C++. To do this, I need to know the internal logic of the XGboost trained model and translate them into a series of if-then-else statements like decision trees, if I am not wrong.

How can this be done? How to find out the internal logic of the XGBoost trained model to implement it on another system?

I am using python 3.7.

guagay_wk
  • 26,337
  • 54
  • 186
  • 295
  • Another way of thinking about this question is: "How can I serialize/deserialize the model in a format that both languages can understand?" Pickling may be the answer. This answer has a few potential options: https://stackoverflow.com/questions/1296162/how-can-i-read-a-python-pickle-database-file-from-c – JacobIRR Sep 28 '19 at 03:49
  • This does something similar, but doesn't yet offer C++ output: https://github.com/modusdatascience/sklearn2code. C++ could be added with a little work. – jcrudy Apr 27 '20 at 16:00
  • have you been able to find what's the internal logic of XGBoost models? – SBF Aug 25 '23 at 19:45

3 Answers3

11

m2cgen Is an awesome package that will convert Scikit-Learn compatible models into raw code. If you are using XGBoosts sklearn wrappers (which it looks like you are), then you can simply call something like this:

model = XGBClassifier()
model.fit(x_train, y_train)
 ...
import m2cgen as m2c

with open('./model.c','w') as f:
    code = m2c.export_to_c(model)
    f.write(code)

The really awesome thing about this package, is that it supports many different types of models, such as

  • Linear
  • SVM
  • Tree
  • Random Forest
  • Boosting

One more thing. m2cgen also supports multiple languages such as

  • C
  • C#
  • Dart
  • Go
  • Haskell
  • Java
  • JavaScript
  • PHP
  • PowerShell
  • Python
  • R
  • Visual Basic

I hope this helps!

gnodab
  • 850
  • 6
  • 15
7

Someone wrote a script that does exactly this. Check out https://github.com/popcorn/xgb2cpp

Aaron
  • 801
  • 6
  • 13
0

The recommended way of using any ml/dl model is making simple RESTful API with flask/bottle (these are lightweight python frameworks) and use them globally with any language.

You can also containerize RESTful API with docker in case you are developing a big project with a lot of models. Even containerized Restful APIs are used to deploy models on the cloud, ex- aws.

If you are interested in getting the logic behind any ml model, always have a look on its source code (on GitHub).