clustering VS supervised classification, in the case of very small database

Question

I'm trying to classify/cluster subjects according to 4 features in two classes: healthy and sick.

Two things to know: I know the labels/classes of each subject + I only have 40 subjects (in total: training + testing set!)

What should I choose in this case, clustering or classification?

if you have categorical variables, you better choose classification — Frayal, Aug 21 '18 at 13:32
I’m voting to close this question because it’s not about programming. — user438383, May 27 '22 at 03:06

score 1 · Accepted Answer · answered Aug 25 '18 at 12:05

1

Clustering vs classification is not the choice of method but choice of problem. What is the problem at hand? You have labeled data and want to get a model that can label more - this is by definition classification. In terms of what specific method of classification to use it is a whole new, research-driven, question, rather than a simple programming issue. In particular many classifiers will try to fit some sort of generative model to the data (and thus learn about the structure even without labels), but in the end - labels are there, and should be used.*

answered Aug 25 '18 at 12:05

lejlot

64,777
8
131
164

so the fact that I have very little subjects does not count? – learners Aug 25 '18 at 17:06
1

It does not affect the fact that it is a **classification** problem, and should be treated as such. What exactly will be the solution, how much information about the labels is actually needed in a separate issue. In particular classification methods like kNN or Naive Bayes could do relatively well even in low data regimes – lejlot Aug 25 '18 at 18:08

score 0 · Answer 2 · edited Mar 27 '22 at 16:34

Clustering is based on unsupervised learning and classification is based on supervised learning. Unsupervised learning is used when you don't have the target labels, it is used to cluster the data into groups. Whereas supervised learning is used when you have labeled data. In your statement you have mentioned that you have labels then go for classification algorithms like logistic regression, svm etc. Also if you have a small dataset then you should take care of over fitting, to overcome this go for simple algorithms.

score 0 · Answer 3 · answered May 27 '22 at 02:51

Classification is type of supervised learning. In the Classification you know algorithm needs to predict from finite set of output. For example input data has information about people who take credit card. Then algorithm will learn pattern from input data and output column(take credit card or not).Once algorithm learn it will predict from unseen data take credit card or not. In this example there are only finite number of output(2 in this case - take credit card or not). This problem can be solved using classification.

Clustering is in the unsupervised learning. It mainly deal with data which is not labelled. Clustering algorithm will separate data based on similar characteristics

clustering VS supervised classification, in the case of very small database

3 Answers3