0

I am new to Machine learning and I have this basic question. As I am weak in Math part of the algorithm I find it difficult to understand this.

When you are given a task to design a classifier(keep it simple -- a 2 class classifier) using unsupervised learning(no training samples), how to decide what type of classifier(linear or non-linear) to use? If we do not know this, then the importance on feature selection(which means indirectly knowing what the data set is) becomes very critical. Am I thinking in the right direction or is there something big that I dont know. Insight into this topic is greatly appreciated.

ShakHub
  • 43
  • 8

2 Answers2

1

classification is by definition a "supervised learning" problem. such models require examples of points within given classes to understand how to separate the classes from one another. if you are simply looking for relationships between unlabeled data points, you're solving an unsupervised problem. look into clustering algorithms. k-means is where a lot of people start.

hope this helps!

follyroof
  • 3,430
  • 2
  • 28
  • 26
0

This is a huge problem. Yes, the term "clustering" is the best entry point for googling about that, but I understand that you want to train a classifier, where "training" means optimizing an objective function with parameters. The first choice is definitely not discriminative classifiers (such as linear ones), because with them, the standard maximum likelihood (ML) objective does not work without labels. If you absolutely want to use linear classifiers, then you have to tweak the ML objective, or better use another objective (approximating the classifier risk). But an easier choice is to rather look at generative models, such as HMMs, Naive Bayes, Latent Dirichlet Allocation, ... for which the ML objective works without labels.

xtof54
  • 1,233
  • 11
  • 15