Comparing the predicted class for each instance of test data from different models

Question

My test set data contains about 50,000 instances. I trained different machine learning models. Now I want to do some comparison to see for example if for every instance x_i that model A predicted as 0, models B and C also predicted that instance as 0.

For example, below are the first 5 predictions by the models.

import pandas as pd

data = {'true_class': [3.0, 3.0, 3.0, 3.0, 3.0],
 'rf_pred': [3.0, 0.0, 0.0, 0.0, 0.0],
 'mlp_pred': [3.0, 0.0, 0.0, 0.0, 0.0],
 'knn_pred': [3.0, 0.0, 0.0, 0.0, 0.0],
 'lg_pred': [3.0, 0.0, 0.0, 0.0, 0.0],
 'ada_pred': [2.0, 2.0, 2.0, 2.0, 2.0]}

df = pd.DataFrame(data)
df
 true_class rf_pred mlp_pred knn_pred lg_pred ada_pred
0   3.0     3.0     3.0      3.0      3.0      2.0
1   3.0     0.0     0.0      0.0      0.0      2.0
2   3.0     0.0     0.0      0.0      0.0      2.0
3   3.0     0.0     0.0      0.0      0.0      2.0
4   3.0     0.0     0.0      0.0      0.0      2.0

Clearly predictions of rf_pred, mlp_pred, knn_pred & lg_pred are the same for these five instances.

Is there any way to perform such analysis, per haps visually?

Søren · Answer 1 · 2022-07-06T11:52:47.723

0

Here's a heatmap approach: Each row shows a model's predictions, each column is an instance predicted and the color represents the value predicted.

import plotly.express as px
fig = px.imshow(list(data.values()), y = list(data.keys()))
fig.show()

Secondly you could compare models with each other, by comparing the rates at which they predict the same category for the same instance.

import pandas as pd

df = pd.DataFrame(data)
rate_of_same_prediction = df.apply(lambda x:[ (x== df[ col ]).mean() for col in df.columns], axis=0)
rate_of_same_prediction.index = rate_of_same_prediction.columns 
fig = px.imshow(rate_of_same_prediction)
fig.show()

Here both columns and rows represent your models.

edited Jul 06 '22 at 11:52

answered Jul 06 '22 at 11:23

Søren

81
3

May I know why you consider taking columns' mean in `x== df[ col ]).mean()`? – Jul 06 '22 at 12:18
Yes, x== df[ col ]) gives a series of True/False, indicating whether the two models being compared (columns) predicted the same category for the same instance. The mean of this is the rate a which they predict the same. '1' is the same in all instances and '0' is never the same. (True/False I interpreted as 1/0 when calculating the mean). – Søren Jul 06 '22 at 13:03

score 0 · Answer 2 · edited Jul 06 '22 at 14:57

0

import matplotlib.pyplot as plt

# To check the relationship between those predictions
plt.scatter(df[true_class], df[rf_pred])

You can as well use the df.corr() method or use the regplot method in seaborn

edited Jul 06 '22 at 14:57

ChrisGPT was on strike

127,765
105
273
257

answered Jul 06 '22 at 11:25

Abdulraheem Quwam

1
1

`regplot` method? Can you add details to your answer? – Jul 06 '22 at 11:26
import seaborn as sns --> sns.regplot(x=df['true_class'], y=df['rf_pred']);................you do the same to other columns as well ... – Abdulraheem Quwam Jul 06 '22 at 11:31
But this means line regression plot not categorical. – Jul 06 '22 at 12:22

Comparing the predicted class for each instance of test data from different models

2 Answers2