Multicollinearity

In statistics, multicollinearity or collinearity is a situation where the predictors in a regression model are linearly dependent.

Perfect multicollinearity refers to a situation where the predictive variables have an exact linear relationship. When there is perfect collinearity, the design matrix has less than full rank, and therefore the moment matrix cannot be inverted. In this situation, the parameter estimates of the regression are not well-defined, as the system of equations has infinitely many solutions.

Imperfect multicollinearity refers to a situation where the predictive variables have a nearly exact linear relationship.

Contrary to popular belief, neither the Gauss–Markov theorem nor the more common maximum likelihood justification for ordinary least squares relies on any kind of correlation structure between dependent predictors (although perfect collinearity can cause problems with some software).

There is no justification for the practice of removing collinear variables as part of regression analysis, and doing so may constitute scientific misconduct. Econometricians and statisticians have facetiously referred to imperfect collinearity as "micronumerosity", noting that it is only a problem when working with an insufficient sample size. Including collinear variables does not reduce the predictive power or reliability of the model as a whole, and does not reduce the accuracy of coefficient estimates.

High collinearity indicates that it is exceptionally important to include all collinear variables, as excluding any will cause worse coefficient estimates, strong confounding, and downward-biased estimates of standard errors.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.