How to choose the degree of a Hypothesis function?

Question

In a normal machine learning problem you get many features(eg:- if you are making an image recogniser), so when there are many features you can't visualise the data(you can't plot a graph). Without plotting a graph is there a way to determine what degree of a hypothesis function should we use for that problem? How to determine the best hypothesis functions to use? eg:-

if there are 2 inputs x(1),x(2).

whether to choose (w0) + x(1)*w(1) + x(2)*w(2) as the hypothesis function or

w(0) + x(1)*w(1) + x(2)*w(2) + x(1)*x(2)*w(3) + (x(1)^2)*w(4) + (x(2)^2)*w(5)

as the hypothesis function :where w(0),w(1),w(2),w(3)...... are weights.

Normally, you'd either evaluate various feature sets (or kernels) on a test set and measure something like accuracy or F1-score. But this question is really off-topic; try http://metaoptimize.com/qa or http://stats.stackexchange.com/ — Fred Foo, Oct 11 '12 at 09:50
@larsmans That is how you train your hypothesis function right? I want to know how to choose the degree of the hypothesis function. If one choose a 2nd degree hypothesis function, using your training set you can get the optimum 2nd degree hypothesis function(by training). If one choose a 3rd degree hypothesis function, by training you can get the optimum 3rd degree hypothesis function. But the optimum 2nd degree hypothesis function might be better/worse than the optimum 3rd degree hypothesis function. I want to know how to choose the most optimum "degree" for the hypothesis function. — sachira, Oct 11 '12 at 10:09
No, you train on a training set. Then you test on a held-out test set to see how well your optimal solution for the training set generalizes to unseen data. — Fred Foo, Oct 11 '12 at 10:36
@larsmans Yes. But training is done to optimize a given/chosen hypothesis right? But how to find whether a 2nd degree hypothesis would be better or 3rd degree hypothesis would be better or........nth degree hypothesis would be better? (n=1,2,3,4,......) — sachira, Oct 11 '12 at 14:11
By what I said in my first answer, evaluating it on a test set. I can keep repeating what I said, but maybe you'd better read a book on machine learning, e.g. [*ESL*](http://www-stat.stanford.edu/~tibs/ElemStatLearn/). — Fred Foo, Oct 11 '12 at 14:22

score 7 · Accepted Answer · answered Oct 11 '12 at 16:09

The first major step to apply is feature selection or feature extraction (dimensionality reduction). This is a pre-processing step that you can apply using certain relevance metrics like correlation, mutual-information as mRmR. Also, there are other methods stimulated by the domain of numerical linear algebra and statistics such as principle component analysis for finding features describing the space based on some assumptions.

Your question is related to a major concern in the field of machine learning known as model selection. The only way to know which degree to use is to experiment with models of different degrees (d=1, d=2, ...) keeping in mind the following:

1- Overfitting: you need to avoid overfitting by making sure that you limit the ranges of the variables (the Ws in your case). This solution is known as regularization. Also, try not to train the classifier for long time like in the case of ANN.

2- Prapring training, validation and testing sets. Training is for training the model, validation is for tuning the parameters and testing is for comparing different models.

3- Proper choice of the performance evaluation metric. If your training data is not well-balanced (i.e. nearly the same number of samples is assigned for each value or class lable of your target variable), then accuracy is not indicative. In this case, you may need to consider sensitivity, specificity or Mathew correlation.

Experiments is the key and indeed you are limited by resources. Nevertheless, proper design of the experiment could serve your purpose.

How to choose the degree of a Hypothesis function?

1 Answers1