How should decide about using linear regression model or non linear regression model

Question

How should one decide between using a linear regression model or non-linear regression model?

My goal is to predict Y.

In case of simple x and y dataset I could easily decide which regression model should be used by plotting a scatter plot.

In case of multi-variant like x1,x2,...,xn and y. How can I decide which regression model has to be used? That is, How will I decide about going with simple linear model or non linear models such as quadric, cubic etc.

Is there any technique or statistical approach or graphical plots to infer and decide which regression model has to be used? Please advise.

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

That is a pretty complex question.

You start visually first: if the data is normally distributed, and satisfy conditions for classical linear model, you use linear model. I normally start by making a scatter plot matrix to observe the relationships. If it is obvious that the relationship is non linear then you use non-linear model. But, a lot of times, I visually inspect, assuming that the number of factors are just not too many. For example, this would be a non linear model:

However, if you want to use data mining (and computationally demanding methods), I suggest starting with stepwise regression. What you do is set a model evaluation criteria first: could be R^2 for example. You start a model with nothing and sequentially add predictors or permutations of them until your model evaluation criteria is "maximized". However, adding new predictor almost always increases R^2, a type of over-fitting.

The solution is to split the data into training and testing. You should make model based on the training and evaluate the mean error on testing. The best model will be the one that that minimized mean error on the testing set.

If your data is sparse, try integrating ridge or lasso regression in model evaluation.

Again, this is a kind of a complex question. The answer also kind of depends on whether you are building descriptive or explanatory model.

How should decide about using linear regression model or non linear regression model

1 Answers1