How many labels are acceptable before using regression over classification

Question

I have a problem where I'm trying to use supervised learning in python. I have a series of x,y coordinates which i know belong to a label in one data set. In the other i have only the x,y coordinates. I am going to use one set to train the other, my approach is that of supervised learning and to use a classification algorithm (linear discriminant analysis) as the number of labels is discrete. Although they are discrete, they are large in number (n=~80,000). My question, at which number of labels should i consider regression over classification where regression is better suited to continuous labels. I'm using SciKit as my machine learning package and using astronml.orgs excellent tutorial as a guide.

score 0 · Answer 1 · answered May 01 '16 at 13:05

It is not about numbers. It is about being continuous or not. It does not matter if you have 80,000 classes or even more; as long as there is no correlation between neighbour classes (for eg. class i and i+1), you should use classification (not regression).

Regression only makes sense when the labels are continuous (real numbers for eg.) or at least when there is a correlation between adjacent classes (for eg. when labels show the count of something, you can do regression and then round up the results).

How many labels are acceptable before using regression over classification

1 Answers1