Multi-class, multi-label, ordinal classification with sklearn

Question

I was wondering how to run a multi-class, multi-label, ordinal classification with sklearn. I want to predict a ranking of target groups, ranging from the one that is most prevalant at a certain location (1) to the one that is least prevalent (7). I don't seem to be able to get it right. Could you please help me out?


# Random Forest Classification

# Import
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split
from sklearn.metrics import make_scorer, accuracy_score, confusion_matrix, f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

# Import dataset
dataset = pd.read_excel('alle_probs_edit.v2.xlsx')
X = dataset.iloc[:,4:-1].values
Y = dataset.iloc[:,-1].values

# Split in Train and Test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42 )

# Scaling the features (alle Variablen auf eine gleiche Ebene), necessary depend on the choosen method
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Creat classifier
classifier =  RandomForestClassifier(criterion = 'entropy')

# Choose some parameter combinations to try
parameters = {'bootstrap': [True, False],
 'max_depth': [50],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 3, 4],
 'min_samples_split': [9, 10, 11, 12, 13],
 'n_estimators': [500,1000,1500]}

# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(accuracy_score)

# Run the grid search
grid_obj = GridSearchCV(classifier, parameters, scoring=acc_scorer, cv = 3, n_jobs = -1)
grid_obj = grid_obj.fit(X_train, Y_train)

# Set the classifier to the best combination of parameters
classifier = grid_obj.best_estimator_

# Fit the best algorithm to the data
classifier.fit(X_train, Y_train)

#Prediction the Test data
Y_pred = classifier.predict(X_test)

#Confusion Matrix
cm = pd.DataFrame(confusion_matrix(Y_test, Y_pred))

#Accuracy
accuracy1 = accuracy_score(Y_test, Y_pred)
print("Accuracy1: %.2f%%" % (accuracy1 * 100.0))

# k-Fold Class Validation
accuracy1 = cross_val_score(estimator = classifier, X = X_train, y = Y_train, cv = 10)
kfold = accuracy1.mean()
accuracy1.std()

score 6 · Answer 1 · edited Feb 24 '22 at 09:19

This may not be the precise answer you're looking for, this article outlines a technique as follows:

We can take advantage of the ordered class value by transforming a k-class ordinal regression problem to a k-1 binary classification problem, we convert an ordinal attribute A* with ordinal value V1, V2, V3, … Vk into k-1 binary attributes, one for each of the original attribute’s first k − 1 values. The ith binary attribute represents the test A* > Vi

Essentially, aggregate multiple binary classifiers (predict target > 1, target > 2, target > 3, target > 4) to be able to predict whether a target is 1, 2, 3, 4 or 5. The author creates an OrdinalClassifier class that stores multiple binary classifiers in a Python dictionary.

class OrdinalClassifier():

    def __init__(self, clf):
        self.clf = clf
        self.clfs = {}

    def fit(self, X, y):
        self.unique_class = np.sort(np.unique(y))
        if self.unique_class.shape[0] > 2:
            for i in range(self.unique_class.shape[0]-1):
                # for each k - 1 ordinal value we fit a binary classification problem
                binary_y = (y > self.unique_class[i]).astype(np.uint8)
                clf = clone(self.clf)
                clf.fit(X, binary_y)
                self.clfs[i] = clf

    def predict_proba(self, X):
        clfs_predict = {k: self.clfs[k].predict_proba(X) for k in self.clfs}
        predicted = []
        for i, y in enumerate(self.unique_class):
            if i == 0:
                # V1 = 1 - Pr(y > V1)
                predicted.append(1 - clfs_predict[i][:,1])
            elif i in clfs_predict:
                # Vi = Pr(y > Vi-1) - Pr(y > Vi)
                 predicted.append(clfs_predict[i-1][:,1] - clfs_predict[i][:,1])
            else:
                # Vk = Pr(y > Vk-1)
                predicted.append(clfs_predict[i-1][:,1])
        return np.vstack(predicted).T

    def predict(self, X):
        return np.argmax(self.predict_proba(X), axis=1)

    def score(self, X, y, sample_weight=None):
        _, indexed_y = np.unique(y, return_inverse=True)
        return accuracy_score(indexed_y, self.predict(X), sample_weight=sample_weight)

The technique originates in A Simple Approach to Ordinal Classification

You might want to add some inheritance for OrdinalClassifier. ``` from sklearn.base import clone, BaseEstimator, ClassifierMixin class OrdinalClassifier(BaseEstimator, ClassifierMixin): ... ``` Then, if you want to use something like GridSearchCV, you can create a subclass for a specific algorithm: ``` class KNeighborsOrdinalClassifier(OrdinalClassifier): def __init__(self, n_neighbors=5, ...): self.n_neighbors = n_neighbors ... self.clf = KNeighborsClassifier(neighbors=self.n_neighbors, ...) self.clfs = {} ``` — David Diaz, Nov 13 '20 at 21:08
@David Diaz I am currently working with the OrdinalClassifier from Kartik Chugh and was indeed looking for a way to use GridSearch or RandomSearch. I think I get what you propose, but I'm really not sure how to implement this. Could you maybe give a code example? Thanks in advance! — t.pellegrom, Jan 26 '21 at 15:33
There's some good reference material here I've used before: http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/ Hope that helps! — Kartik Chugh, Jan 27 '21 at 05:06
@t.pellegrom, I've posted an example with KNN now. Hopefully close enough so you can kick it and get it running! — David Diaz, Jan 28 '21 at 21:24
Inspired by this and aggreeing with @DavidDiaz that a subclass is needed that will support GS , pipeline, etc. I took a stab at this with a generic wrapper for any classifier. I have not tested extensively yet but I did stub in some code that addresses some problems raised here: https://towardsdatascience.com/simple-trick-to-train-an-ordinal-regression-with-any-classifier-6911183d2a3c about how some prob don't sum to 1. I used sklearn OvR as foundation starting point. https://github.com/leeprevost/OrdinalClassifier/blob/main/ordinal.py. Have not tested yet. Will post progress... — leeprevost, May 09 '22 at 19:04

score 2 · Answer 2 · answered Jan 28 '21 at 21:23

Here is an example using KNN that should be tuneable in an sklearn pipeline or grid search.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.base import clone, BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_is_fitted, check_array
from sklearn.utils.multiclass import check_classification_targets

class KNeighborsOrdinalClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, n_neighbors=5, *, weights='uniform', 
                 algorithm='auto', leaf_size=30, p=2, 
                 metric='minkowski', metric_params=None, n_jobs=None):
        
        self.n_neighbors = n_neighbors
        self.weights = weights
        self.algorithm = algorithm
        self.leaf_size = leaf_size
        self.p = p
        self.metric = metric
        self.metric_params = metric_params
        self.n_jobs = n_jobs
        
    def fit(self, X, y):
        X, y = check_X_y(X, y)
        check_classification_targets(y)
        
        self.clf_ = KNeighborsClassifier(**self.get_params())
        self.clfs_ = {}
        self.classes_ = np.sort(np.unique(y))
        if self.classes_.shape[0] > 2:
            for i in range(self.classes_.shape[0]-1):
                # for each k - 1 ordinal value we fit a binary classification problem
                binary_y = (y > self.classes_[i]).astype(np.uint8)
                clf = clone(self.clf_)
                clf.fit(X, binary_y)
                self.clfs_[i] = clf
        return self
    
    def predict_proba(self, X):
        X = check_array(X)
        check_is_fitted(self, ['classes_', 'clf_', 'clfs_'])
        
        clfs_predict = {k:self.clfs_[k].predict_proba(X) for k in self.clfs_}
        predicted = []
        for i,y in enumerate(self.classes_):
            if i == 0:
                # V1 = 1 - Pr(y > V1)
                predicted.append(1 - clfs_predict[y][:,1])
            elif y in clfs_predict:
                # Vi = Pr(y > Vi-1) - Pr(y > Vi)
                 predicted.append(clfs_predict[y-1][:,1] - clfs_predict[y][:,1])
            else:
                # Vk = Pr(y > Vk-1)
                predicted.append(clfs_predict[y-1][:,1])
        return np.vstack(predicted).T
    
    def predict(self, X):
        X = check_array(X)
        check_is_fitted(self, ['classes_', 'clf_', 'clfs_'])
        
        return np.argmax(self.predict_proba(X), axis=1)

Thank you very much. This helps a lot! I have one more question. This methods appears to have a bias towards the middle classifications (if I have 10 classes it classifies 80% of observations as classes 5 and 6 even though all classes appear exactly the same number of times in my data. What can cause this and how can I try to mitigate it? — t.pellegrom, Feb 19 '21 at 01:43
1. Visualize your data to see if you can see any ways to separate them given your input data. 2. You may also need to explore feature transformations (PCA, LDA, etc.) to better separate your classes. 3. If you have domain expertise to inform a preference for precision vs. recall, you could explore weighting of certain classes or customize a more informative scoring metric than f1, precision, recall, etc. 4. I've also found it helpful to make a "dummy" classifier for benchmarking. — David Diaz, Feb 20 '21 at 04:56
Did you tested it? I have to change the function to ``` def predict_proba(self, X): ... for i,y in enumerate(self.classes_): if i == 0: # V1 = 1 - Pr(y > V1) predicted.append(1 - clfs_predict[i][:,1]) elif y in clfs_predict: # Vi = Pr(y > Vi-1) - Pr(y > Vi) predicted.append(clfs_predict[i-1][:,1] - clfs_predict[i][:,1]) else: # Vk = Pr(y > Vk-1) predicted.append(clfs_predict[i-1][:,1]) return np.vstack(predicted).T ``` — Fernando Felix, May 01 '21 at 02:40
@FernandoFelix, I have not tested it. I think you are right if your target variable does not start at zero and increment by 1. It does seem like the edits you proposed would allow the target variable to have different levels, and is a more generalized version of what I wrote. — David Diaz, Oct 14 '21 at 15:51
@FernandoFelix I think the revision you're suggesting could also be achieved by storing the classifiers in self.clfs_ using the class labels as the key to the dictionary instead of the place in the ordinal levels (i.e., using `self.classes_[i]` as the key instead of `i`). — David Diaz, Oct 14 '21 at 15:58

score 0 · Answer 3 · answered May 17 '22 at 17:40

Building off both David Diaz, the white paper, and Kartik above along with others linked to on Medium and attributed in the readme, I'm working on an OrdinalClassifier that is built on the sklearn framework and which works well with sklearn pipelines, scoring, and cross validation.

The OC performs very well vs. standard non ordinal mc classification and gives greater control over optimizing for precision/recall on the positive class (ie. "high" in for example the diabetes disease progression of low<medium<high classes. It supports any sklearn classifier that supports pred_proba. Cross validation scores are shown on repo.

OrdinalClassifer based on sklearn

https://github.com/leeprevost/OrdinalClassifier

At this time, I would not call it multi-label.

Multi-class, multi-label, ordinal classification with sklearn

3 Answers3