Does loss function becomes non convex when we add polynomial features?

Question

When we use polynomial features in case of polynomial regression, logistic regression, svm , does the loss function becomes non convex ?

Hans Musgrave · Answer 1 · 2020-07-16T20:13:38.633

-1

If a loss function is convex for any choice of X -> y you're trying to estimate then adding a fixed set of polynomial features won't change that. You're simply trading your initial problem with the estimation problem X' -> y, where X' has the additional features.

If you're additionally trying to estimate the parameters for the new feature(s) then it's pretty easy to get a non-convex loss in those dimensions (assuming there are parameters to choose -- if you're just talking about adding a polynomial basis then this doesn't apply).

As some measure of proof, take the example of a 1D estimation problem and choose the feature f(x) = (x-a)^3. Assume your dataset has the single point (0, 0). With a little work you can show that the loss even for linear regression over the new feature is non-convex in places with respect to the parameter a. Note that the loss IS still convex with respect to the new features -- standard linear regression always satisfies that property -- it's the fact that we used linear regression along with a choice of polynomial to build a new non-convex regressor that causes this behavior.

edited Jul 16 '20 at 20:13

answered Jul 16 '20 at 18:46

Hans Musgrave

6,613
1
18
37

`With a little work you can show that the loss even for linear regression is non-convex`, How `linear regression - loss` it's non-convex, it's never been non-convex function, and only the domain (X) might be non-convex, which mean problem itself has no solution using unconstrained optimization. – 4.Pi.n Jul 16 '20 at 19:16
`sum of squared residuals` is a **strictly convex** function by the definition. – 4.Pi.n Jul 16 '20 at 19:18
The non-convexity comes from a little side-stepping. There's no argument that the sum of squares residual is convex in X, y, and any weights. If those variables are defined in terms of others though (as in the example `(x-a)^3`) then the loss can still be non-convex _in those additional variables_. – Hans Musgrave Jul 16 '20 at 19:25
For strictly convex, you always have a unique solution, and we are taking about function, itself, not the learning rate, which we could easily get ride of choosing appropriate one, using line search, for example, your point totally incorrect, a clear example for convexity is a least square solution using `Normal Equation`, there's no need to choose a learning rate, and for iterative solution using second order approximation, learning rate choice, is not a problem at all – 4.Pi.n Jul 16 '20 at 19:31
The answer was quite general and had nothing to do with learning rates, unique solutions, or anything else in your comment. If you (1) add polynomial features and (2) appropriately parametrize those features then the loss can be non-convex in those parameters. The original question didn't specify how the polynomial features were chosen, so it feels appropriate to cover that possibility. Are you objecting to my choice of covering it, or do you disagree with the asserted non-convexity? – Hans Musgrave Jul 16 '20 at 19:37
`The **loss** even for linear regression is non-convex in places`, you are taking about loss function, not domain here, disagree with the asserted non-convex, for loss function, loss function of linear regression, will never be a **non-convex**, it's a strictly convex function, and even you are talking about the problem domain (input space), if your domain is not a convex, or couldn't be transformed to a convex (concave) then you can't use **linear regression**. – 4.Pi.n Jul 16 '20 at 19:46
And, it's rarely occurs to find a problem, domain which 100% non-convex or 100% convex, but if you don't apply any kind of transformation, convexity assumption of the problem domain have to hold – 4.Pi.n Jul 16 '20 at 19:50
Mm, I think I see the point of confusion. The problem is that it's _not_ linear regression with respect to the new parameters (only with respect to the new feature). I'll update my answer to reflect that. – Hans Musgrave Jul 16 '20 at 20:11

Does loss function becomes non convex when we add polynomial features?

1 Answers1