How does sklearn.tree.DecisionTreeClassifier function predict_proba() work internally?

Question

I know how to use predict_proba() and the meaning of the output. Can anyone tell me how predict_proba() internally calculates the probability for decision tree?

awesomecosmos · Answer 1 · 2022-12-29T22:13:21.250

Here is the official source code for sklearn.tree.DecisionTreeClassifier's predict_proba method, which I found from the official scikit-learn documentation (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html), and clicking [source] for the predict_proba_ method: https://github.com/scikit-learn/scikit-learn/blob/98cf537f5/sklearn/tree/_classes.py#L897. I have also included a snippet of the source code for predict_proba below:

def predict_proba(self, X, check_input=True):
    """Predict class probabilities of the input samples X.
    The predicted class probability is the fraction of samples of the same
    class in a leaf.
    Parameters
    ----------
    X : {array-like, sparse matrix} of shape (n_samples, n_features)
        The input samples. Internally, it will be converted to
        ``dtype=np.float32`` and if a sparse matrix is provided
        to a sparse ``csr_matrix``.
    check_input : bool, default=True
        Allow to bypass several input checking.
        Don't use this parameter unless you know what you're doing.
    Returns
    -------
    proba : ndarray of shape (n_samples, n_classes) or list of n_outputs \
        such arrays if n_outputs > 1
        The class probabilities of the input samples. The order of the
        classes corresponds to that in the attribute :term:`classes_`.
    """
    check_is_fitted(self)
    X = self._validate_X_predict(X, check_input)
    proba = self.tree_.predict(X)

    if self.n_outputs_ == 1:
        proba = proba[:, : self.n_classes_]
        normalizer = proba.sum(axis=1)[:, np.newaxis]
        normalizer[normalizer == 0.0] = 1.0
        proba /= normalizer

        return proba

    else:
        all_proba = []

        for k in range(self.n_outputs_):
            proba_k = proba[:, k, : self.n_classes_[k]]
            normalizer = proba_k.sum(axis=1)[:, np.newaxis]
            normalizer[normalizer == 0.0] = 1.0
            proba_k /= normalizer
            all_proba.append(proba_k)

        return all_proba

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Dec 29 '22 at 08:00
I've edited my original answer to include more information about how I found this code! — awesomecosmos, Dec 29 '22 at 22:14

score -2 · Answer 2 · answered Nov 23 '22 at 10:16

First You have to see this for basics of decision tree https://www.youtube.com/watch?v=_L39rN6gz7Y and after that here is the link :https://www.youtube.com/watch?v=wpNl-JwwplA to see how these probabilities are calculated.

Here for predict_proba() function just finds out the probability of occurrence of all the all the classes (and predict() uses the class that have maximum probability from the predict_proba() )

How does sklearn.tree.DecisionTreeClassifier function predict_proba() work internally?

2 Answers2