6

How do I return all the hyperparameters of a CatBoost model?

NOTE: I do not think this is a dup of Print CatBoost hyperparameters since that question/answer doesn't address my need.

For example, with sklearn I can do:

rf = ensemble.RandomForestClassifier(min_samples_split=2)
print rf

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

This returns all the hyperparameters, those I defined and the other defaults.

With Catboost I can use .get_params() but it seems to return only user specified parameters:

cat = CatBoostClassifier(loss_function='Logloss',
                         verbose = False,
                        eval_metric='AUC',
                        iterations=500,
                         thread_count = None,
                        random_state=SEED)
print cat.get_params()

{'iterations': 500, 'random_state': 42, 'verbose': False, 'eval_metric': 'AUC', 'loss_function': 'Logloss'}

For example, I'd like to know what learning_rate was used, but Ideally get the whole list.

ADJ
  • 4,892
  • 10
  • 50
  • 83

3 Answers3

10

You can try change your

print cat.get_params()

to

print cat.get_all_params()

Source: get_all_params documentation

Ralph Deint
  • 380
  • 1
  • 4
  • 15
2

You can find a detailed description of all training parameters with their default values here: https://catboost.ai/docs/concepts/python-reference_parameters-list.html#python-reference_parameters-list

Robert Lacok
  • 4,176
  • 2
  • 26
  • 38
0

I came across this looking for the same answer.

Unfortunately, it doesn't seem to be possible. Here's an excerpt from the documentation:

If the value of a parameter is not explicitly specified, it is set to the default value. In some cases, these default values change dynamically depending on dataset properties and values of user-defined parameters.

So because they can change dynamically, those ones won't likely be the same in the output as they were technically for the input. I tried to get most of the parameters, and at least keep track whether those defaults likely change between versions. Here it is if it helps you:

from catboost import CatBoostClassifier, CatBoostRegressor
import random
import numpy as np

#Create fake dataset for testing:
random.seed(42)
X = np.array([random.random() for x in range(1000)])
y = X ** 2 + random.random()
y_class = [1 if x > 1 else 0 for x in y]

cbc = CatBoostClassifier() #Trend classifier
cbc.fit(X, y_class, verbose=False)
cbc.get_all_params()

#with the output:
{'nan_mode': 'Min', 'eval_metric': 'Logloss', 'iterations': 1000, 'sampling_frequency': 'PerTree', 'leaf_estimation_method': 'Newton', 'grow_policy': 'SymmetricTree', 'penalties_coefficient': 1, 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'l2_leaf_reg': 3, 'random_strength': 1, 'rsm': 1, 'boost_from_average': False, 'model_size_reg': 0.5, 'subsample': 0.800000011920929, 'use_best_model': False, 'class_names': [0, 1], 'random_seed': 0, 'depth': 6, 'border_count': 254, 'classes_count': 0, 'auto_class_weights': 'None', 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'learning_rate': 0.010301999747753143, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'MVS', 'max_leaves': 64}