0

I have a single trained classifier tested on 2 related multiclass classification tasks. As each trial of the classification tasks are related, the 2 sets of predictions constitute paired data. I would like to run a paired permutation test to find out if the difference in classification accuracy between the 2 prediction sets is significant.

So my data consists of 2 lists of predicted classes, where each prediction is related to the prediction in the other test set at the same index.

Example:

actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.

H0: There is no significant difference in classification accuracy.

How do I go about running a paired permutation test to test significance of the difference in classification accuracy?

1 Answers1

0

I have been thinking about this and I'm going to post a proposed solution and see if someone approves or explains why I'm wrong.

actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
paired_predictions = [[1,1], [3,3], [6,7], [1,10], [22,22], [1,1], [11,7], [12,12], [9,2], [10,10]]

actual_test_statistic = predictions1 - predictions2 # 90%-50%=40 # 0.9-0.5=0.4
all_simulations = [] # empty list
for number_of_iterations:
    shuffle(paired_predictions) # only shuffle between pairs, not within
    simulated_predictions1 = paired_predictions[first prediction of each pair]
    simulated_predictions2 = paired_predictions[second prediction of each pair]
    simulated_accuracy1 = proportion of times simulated_predictions1 equals actual_classes
    simulated_accuracy2 = proportion of times simulated_predictions2 equals actual_classes
    all_simulations.append(simulated_accuracy1 - simulated_accuracy2) # Put the simulated difference in the list

p = count(absolute(all_simulations) > absolute(actual_test_statistic ))/number_of_iterations

If you have any thoughts, let me know in the comments. Or better still, provide your own corrected version in your own answer. Thank you!