4

I'm currently using new CatBoost algorithm (python version) and trying to export my model to txt file to transfer my model to C/Java implementation. Looking into documentation I have only found save_model method which is only accept two formats of file: 1. binary 2. CoreML for Apple

None of this formats is suitable for me, so maybe there is other way to achieve it?

pavko_a
  • 507
  • 4
  • 16

2 Answers2

1

There is no way to do this directly: Catboost doesn't support model serialization so far.

However, Catboost already can transform models to CoreML, and there is a CoreML tool to serialize models to a JSON-like text. Enjoy the minimal example:

from sklearn import datasets
iris = datasets.load_iris()

import catboost
# the shortest possible model specification
cls = catboost.CatBoostClassifier(loss_function='MultiClass', iterations=1, depth=1)
cls.fit(iris.data, iris.target)

# save model to CoreML format
cls.save_model(
    "iris.mlmodel",
    format="coreml", 
    export_parameters={
        'prediction_type': 'probability'
    }
)

# there is a CoreML tool for model serialization
import coremltools
model = coremltools.models.model.MLModel("iris.mlmodel")
model.get_spec()

You probably need to read coremltools documentation to fully understand what this code prints, but you can read the output like this: "There is an ensemble of a single tree with 2 leaves - in the leaf 0, class 0 dominates, in the leaf 1 - classes 1 and 2. Go to the leaf 1, if feature 3 is larger than 0.8, otherwise go to leaf 0"

specificationVersion: 1
description {
  input {
    name: "feature_3"
    type {
      doubleType {
      }
    }
  }
  output {
    name: "prediction"
    type {
      multiArrayType {
        shape: 3
        dataType: DOUBLE
      }
    }
  }
  predictedFeatureName: "prediction"
  predictedProbabilitiesName: "prediction"
  metadata {
    shortDescription: "Catboost model"
    versionString: "1.0.0"
    author: "Mr. Catboost Dumper"
  }
}
treeEnsembleRegressor {
  treeEnsemble {
    nodes {
      nodeBehavior: LeafNode
      evaluationInfo {
        evaluationValue: 0.05084745649058943
      }
      evaluationInfo {
        evaluationIndex: 1
        evaluationValue: -0.025423728245294732
      }
      evaluationInfo {
        evaluationIndex: 2
        evaluationValue: -0.025423728245294732
      }
    }
    nodes {
      nodeId: 1
      nodeBehavior: LeafNode
      evaluationInfo {
        evaluationValue: -0.02752293516463098
      }
      evaluationInfo {
        evaluationIndex: 1
        evaluationValue: 0.01376146758231549
      }
      evaluationInfo {
        evaluationIndex: 2
        evaluationValue: 0.013761467582315471
      }
    }
    nodes {
      nodeId: 2
      nodeBehavior: BranchOnValueGreaterThan
      branchFeatureIndex: 3
      branchFeatureValue: 0.800000011920929
      trueChildNodeId: 1
    }
    numPredictionDimensions: 3
    basePredictionValue: 0.0
    basePredictionValue: 0.0
    basePredictionValue: 0.0
  }
  postEvaluationTransform: Classification_SoftMax
}

There is one downside to this approach: CoreML doesn't support the way Catboost works with categorical features. So if you want to serialize a model with categorical features, you need to one-hot-encode them before training.

David Dale
  • 10,958
  • 44
  • 73
0

If you switch to using command line program, you can use --print-trees option. It only shows trees for the model being trained though. So you can't get trees for the existing model.

Ha.
  • 3,454
  • 21
  • 24