DecisionTreeClassifier
is certainly capable of multiclass classification. The "greater than" just happens to be illustrated in that link, but arriving at that decision rule is a consequence of the affect it has on the information gain or the gini (see later in that page). Decision tree nodes generally have binary rules, so they typically take the form of some value being greater than another. The trick is transforming your data so it has good predictive values to compare.
To be clear, multiclass means your data (say a document) is to be classified as one of a set of possible classes. This is different from multilabel classification, where the document needs to be classified with several classes out of a set of possible classes. Most of the scikit-learn classifiers support multiclass, and it has a few meta-wrappers to accomplish multilabeling. You can also use probabilities (models with the predict_proba
method) or decision function distances (models with the decision_function
method) for multilabeling.
If you are saying you need to apply multiple labels to each datum (like ['red','sport','fast'] to cars), then you need to create a unique label for each possible combination to use trees/forests, which becomes your [0...K-1] set of classes. However, it implies that there is some predictive correlation in the data (for combined color, type, and speed in the cars example). For cars, there may be with red/yellow, fast sports cars, but unlikely for other 3-way combinations. Data may be strongly predictive for those few and very weak for the rest. Better off using SVM or LinearSVC and/or wrapping with OneVsRestClassifier or similar.