Negative Coefficients in linear regression

Question

I have a data set consisting of about 10 independent variables. (1000 rows x 10 columns).

All of which i know will have a positive contribution to my target variable.

Once i run a multivariate linear regression on this, i have negative coefficients. Does this mean that attribute is supposedly having a negative contribution? Therefore my model is incorrect? (as they should all have a positive contribution?)

Any help appreciated. Thanks, J

score 2 · Accepted Answer · answered Feb 17 '16 at 19:13

First, question how you know that the variables are all positive contributions. How do you support that statement? Second, how did you determine that the 10 variables are statistically independent?

If they are not truly independent, then it's possible to see this apparent contradiction. Although each of the ten may have a positive contribution, it's easy to build a case in which a combination over-contributes.

Consider a, b, and c, where a & c have a light positive correlation, and b has a higher correlation with each. If any one of them increases, the output increases. However, if all three of them increase, it's quite possible that a simple polynomial metric will increase too much from both a and c increasing; since b increases with both of them, giving it a negative coefficient can be used to balance that over-contribution. In other terms, since the "winning team" is far too strong, b defects to the opponents to keep the game properly balanced. :-)

Does that clarify the problem? Does it match the problem?

Is there a way to remove noise , when the expected behaviour is linear — Scope, Apr 20 '23 at 18:53

saurabh agarwal · Answer 2 · 2016-02-17T19:57:03.147

1

Your model is fine. It can have negative weights. They (weights) are more of the relative contributions. They shows how one feature has effect compare to other.

a negative weight should not be a problem. It means that the expected value on your dependent feature would be less than 0 when all independent features are set to 0. For some correlated features, it would be expected. For example, if the mean value of your correlated features is -ve, constant would be -ve; On the contrary, a +ve value here would be problematic.

If data's dependent features are always positive then also it can have a positive value. For example, consider an independent features that has a strongly positive correlation to a dependent feature.

The values of the dependent features are positive and have a range from 1-10,
The values of the independent features are positive and have a range from 200-210.

In this case,regression line can cross the x-axis between x=0 and x=200, which would result in a negative value for the constant.i.e., regression line can move from the first to the fourth quadrant

edited Feb 17 '16 at 19:57

answered Feb 17 '16 at 18:47

saurabh agarwal

2,124
4
24
46

but if they are independent of each other, why would one have a negative effect? – J. Warrington Feb 17 '16 at 18:54
no, i didn't say they are independent of each other. they are dependent (relative) to each other. – saurabh agarwal Feb 17 '16 at 19:03
let's say if you have all negative weights. that doesn't mean all the features have -ve effect. Only conclusion we can draw is one feature is contributing more than other. – saurabh agarwal Feb 17 '16 at 19:05
or you can say is expected value on dependent variable will be less than 0 when all independent variables are set to 0. – saurabh agarwal Feb 17 '16 at 19:08

score 1 · Answer 3 · answered Feb 17 '16 at 19:35

The most likely cause is correlation between the variables because of the limited sample size and noise in the system. Only if you collect infinite data and then calculate correlation would it come to zero. The smaller the sample size the more the error in estimating correlation.

1) Try calculating the correlation of the variables with the 1000 examples. 2) My intuition is your negative weights should be pretty small as compared to the positive weights, as the sample size increase the likelihood of a negative weight decreases.

Just curious what are your 10 variables and how do you judge they are independent?

score 1 · Answer 4 · answered Oct 16 '19 at 13:24

This happened to me. I had a positive correlation but negative weights in linear regression with no possible explanation, as data didn't present collinearity and this was not possible to rationalize in the explanation. It simply didn't make sense.

In my case, what was causing this issue was that Pandas dataframe index was messed. After I applied df.reset_index() I had an expected behaviour of variables and the problem was solved.

Negative Coefficients in linear regression

4 Answers4