Applying RAND index with cluster numbers and cluster labels

Question

I have a set of reviews and I've clustered them with k-means and got the clusters each review belongs to (Ex: 1,2,3...). I also have the real labels of which clusters these belongs to Ex: location, food etc.) and I need to compare them with Rand index.

As I have cluster numbers and cluster labels how I can I apply Rand index to compare?

Is there any intermediate step that I should follow?

Edit: I've seen the post Rand Index function (clustering performance evaluation) but it does not answer my question.

In that question, you have

labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]

but what I have is something like below,

labels_true = ['food', 'view', 'room', 'food', 'staff', 'staff']
labels_pred = [0, 0, 0, 1, 0, 1]

Any help is highly appreciated.

Does this answer your question? [Rand Index function (clustering performance evaluation)](https://stackoverflow.com/questions/49586742/rand-index-function-clustering-performance-evaluation) — Riccardo Bucco, Nov 25 '21 at 10:02
@RiccardoBucco Thank you for the comment but that is not exactly what I am looking for — lse23, Nov 25 '21 at 10:23

score 1 · Accepted Answer · answered Nov 25 '21 at 10:48

1

Just use the sklearn.metrics.rand_score function:

from sklearn.metrics import rand_score

rand_score(labels_true, labels_pred)

It doesn't matter if true labels and predicted labels have values in different domains. Please have a look at the examples:

>>> rand_score(['a', 'b', 'c'], [5, 6, 7])
1.0
>>> rand_score([0, 1, 2], [5, 6, 7])
1.0
>>> rand_score(['a', 'a', 'b'], [0, 1, 2])
0.6666666666666666
>>> rand_score(['a', 'a', 'b'], [7, 7, 2])
1.0

answered Nov 25 '21 at 10:48

Riccardo Bucco

13,980
4
22
50

It seems like Jaccard similarity cannot be applied when the true values and predicted values are in different domains. @Riccardo Bucco do yo have an idea of how to handle this scenario? – lse23 Nov 25 '21 at 19:22
@lse23 please open another question :) – Riccardo Bucco Nov 25 '21 at 20:53

Applying RAND index with cluster numbers and cluster labels

1 Answers1