The above dataframe represents the attributes to determine Whether I have cancer or not. The class represents whther the person has cancer or not. Class-2 shows the person donot have cancer, and 4 represents person has cancer. When I try K-means on the dataframe by removing class and id, I got the prediction as 0,1 for all the rows. But now I am confused whether 0/1 is equivalent to 2. How to fugure this out and also how to check accuracy of my model.
Asked
Active
Viewed 1,415 times
-2
-
2I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). – desertnaut Mar 26 '21 at 08:07
1 Answers
1
The K-Means algorithm is not a classifier but a clustering algorithm. Which means it does not give you a mapping from the features to the cancer class. It only find clusters (subsets of related datapoints) in the feature space.
Hence the output 0/1 are the memberships of each datapoint to the found clusters.
If you want to check whether the clusters correlate to the cancer classes, do an analysis:
- How many datapoints in cluster 0 are actually cancer class 2?
- How many datapoints in cluster 1 are actually cancer class 4?
Also take a look at confusion matrix for information on how to evaluate this kind of problem.
Your confusion matrix should look like this:
+-----------------+-----------------------+-----------------------+
| | actual cancer class 4 | actual cancer class 2 |
+-----------------+-----------------------+-----------------------+
| k-Means class 0 | true positive | false positive |
| k-Means class 1 | false negative | true negative |
+-----------------+-----------------------+-----------------------+
- true positive: algorithm predicted cancer and person actually has cancer
- false positive: algorithm predicted cancer but person does not have cancer
- false negative: algorithm predicted no cancer but person actually has cancer
- true negative: algorithm predicted no cancer and person does not have cancer
- Take only the datapoints, that are in cluster 0; Count how many out of that have cancer class 4 -> This will be your true positives.
- Now take only the datapoints, that are in cluster 0; Count how many out of that have cancer class 2 -> This will be your false positives.
- Repeat for the negatives.
Accuracy can be calculated using this formula: acc = (TP+TN) / (TP+FP+FN+TN)

Sparkofska
- 1,280
- 2
- 11
- 34
-
I have checked the count...predicted:::1 -> 451,0 -> 232 original:::2 ---> 444,4--->239 ... so does'nt this denote that person having the disease is denoted by 0 and the person does not have the disease is denoted by 1? – Antony Joy Mar 26 '21 at 07:57
-
And how are you suggesting to use confusuion matrix for checking accuracy?? – Antony Joy Mar 26 '21 at 07:58
-
1To obtain the correlation: you have to calculate the cancer class 4 in k-means cluster 1, instead of the total values. See my edit for details – Sparkofska Mar 26 '21 at 08:20
-
1Take some time and try to deeply understand how confusion matrix works. If you get the theory you will be able to apply it to your specific problem. – Sparkofska Mar 26 '21 at 08:22