For context, I am taking Ad listing data for Machines and using it to predict the type of Machine.
I have used the RandomForestClassifier for class prediction. In the model I have used LabelEncoder to convert all categorical variables, including the feature label (for example, 'Excavator' becomes '5'). After running the model successfully, I am left with my array of predicted values. These values are the encoded values - numerical. What I would like to do now is convert these predictions back into their original strings. E.g. I would like to map the number 5 back to it's original value of 'Excavator' - ideally mapping all of the predicted values in one DataFrame.
I have left out a lot of code below as I don't want to drown people in the full script so I have just left what I deem to be most relevant to my question but if you need to see more in order to help then please let me know!
### ENCODE TO CATEGORICAL ###
# Encoding categorical variables
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# Choose columns to encode
cols = ['make', 'model_of_Ad', 'year_manufactured', 'business', "tag_name_deep"]
# Encode columns
df[cols] = df[cols].apply(LabelEncoder().fit_transform)
# Reset df index
df.reset_index(drop=True, inplace=True)
....
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
# define the model
rf = RandomForestClassifier()
# fit the model on the whole dataset
rf.fit(X_train, y_train)
#Predict on the test set in order to assess accuracy
y_pred = rf.predict(X_test)
# Model Accuracy, how often is the classifier correct?
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
# See predicted values
print(y_pred)
Any help is appreciated!