I fairly know that trees are sensitive to one hot encoded (OHE) targets however I want to understand why it returns the predictions like this:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
.
.
.
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 0]])
For most of the samples, it predict neither class. I will encode my targets as ordinal (since it is applicable) but what if it was not? What to do then? This is how it looks before OHE:
array(['4 -8 weeks', '13 - 16 weeks', '17 - 20 weeks', ..., '9 - 12 weeks',
'13 - 16 weeks'], dtype=object)
Full code:
from sklearn.preprocessing import LabelBinarizer
mlb = LabelBinarizer()
b = mlb.fit_transform(Class)
list(mlb.classes_)
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(data, b, test_size=0.2, random_state=42)
# Create a multi-label classifier
classifier = RandomForestClassifier()
# Train the classifier
classifier.fit(X_train, y_train)
# Make predictions on the test set
y_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)