I have a problem where I'm trying to use supervised learning in python. I have a series of x,y coordinates which i know belong to a label in one data set. In the other i have only the x,y coordinates. I am going to use one set to train the other, my approach is that of supervised learning and to use a classification algorithm (linear discriminant analysis) as the number of labels is discrete. Although they are discrete, they are large in number (n=~80,000). My question, at which number of labels should i consider regression over classification where regression is better suited to continuous labels. I'm using SciKit as my machine learning package and using astronml.orgs excellent tutorial as a guide.
Asked
Active
Viewed 455 times
1 Answers
0
It is not about numbers. It is about being continuous or not. It does not matter if you have 80,000 classes or even more; as long as there is no correlation between neighbour classes (for eg. class i and i+1), you should use classification (not regression).
Regression only makes sense when the labels are continuous (real numbers for eg.) or at least when there is a correlation between adjacent classes (for eg. when labels show the count of something, you can do regression and then round up the results).

Hossein
- 113
- 1
- 7