Better use of Regularization

Question

I am recently studying Machine Learning with Coursera ML course, and some questions popped up while learning cost function with regularization. Please give me your advice if you have any idea.

If I have enough number of training data, I think regularization would reduce the accuracy because the model is able to obtain high reliability and generalized output only from the training set, without regularization. How can I make a good decision whether or not I should use regularization?
Let’s suppose we have a model as follows: w3*x3 + w2*x2 + w1*x1 +w0, and x3 is the feature which particularly causes overfitting; this means it has more outliers. In this situation, I think the way of regularization is sort of unreasonable due to the fact that it takes effect on every weight. Do you know any better way that I can use in this case?
What is the best way to choose the value of lambda? I guess the simplest way is to conduct multiple learning with different lambda values and to compare their training accuracy. However, this is definitely inefficient when we have huge number of training data. I want to know how you choose the ideal lambda value.

Thanks for reading!

This is too broad of a question and is better suited for [Cross Validated](http://stats.stackexchange.com/) or some other site since you're not asking about programming. — Tchotchke, Dec 08 '16 at 13:40
@Tchotchke Thanks for your advice. I should have chosen other proper site for my question. — Curt, Dec 09 '16 at 09:21
You might be interested in [my overview over regularization](https://arxiv.org/pdf/1707.09725.pdf#page=82) (page 68ff and 85) — Martin Thoma, Aug 01 '17 at 06:20

score 0 · Accepted Answer · edited May 23 '17 at 10:30

It's a bad idea to come up with guesses before you evaluate your model on validation data. When you talk about 'accuracy' in your question, to which accuracy do you refer to? Train set accuracy is not very useful in estimation of your model's goodness. Normally, regularization is desirable for many families of ML algorithms. In the case of linear regression, it is definitely worth to do. The question here is only the amount of it, i.e. the value of lambda parameter. Also, you might want to try L1 instead of L2. Read this.
In machine learning, questions like this are normally answered using data. Try a model, investigate how it behaves, try different solutions for the issues you observe.
Read this: How to calculate the regularization parameter in linear regression

Better use of Regularization

1 Answers1