How do i fix the missing output (NA) in my summary/coefficients table R

Question

I was building a logistic regression model in r but when I checked the coefficients using summary(model) the output displayed NA's in the four columns (estimate, standard error, z value and z) for one of my independent variables. My other three variables worked fine.

I also checked for any null values but there were none. I changed it between a continuous and discrete value using as.numeric and as.integer but it still comes out as NA in the output. The variable itself measures total volume of blood donated.

I can't figure this out and it is bothering me. Thanks

Relevant posts: [Logistic regression in R returning NA values](https://stats.stackexchange.com/questions/25839/logistic-regression-in-r-returning-na-values) and [NA in glm model](https://stats.stackexchange.com/questions/212903/na-in-glm-model) — Maurits Evers, Apr 01 '18 at 12:39
tldr: Collinearity between predictor variables will result in `NA` "estimates" for (some of the) predictor variables that are linearly dependent. — Maurits Evers, Apr 01 '18 at 12:44

score 2 · Accepted Answer · answered Apr 01 '18 at 13:30

Here is an example elaborating on the comment I made above; I'm using a simple linear model here, but the same principle applies for your logistic regression model.

Let's generate some data: We generate data for a model y = x1 + x2 + epsilon, where the two predictor variables x1 and x2 are linearly dependent: x2 = 2.5 * x1.
```
# Generate sample data
set.seed(2017);
x1 <- seq(1, 100);
x2 <- 2.5 * x1;
y <- x1 + x2 + rnorm(100);
```

We fit the model.

df <- cbind.data.frame(x1 = x1, x2 = x2, y = y);
fit <- lm(y ~ x1 + x2, df);

Look at parameter estimates.

summary(fit);
#
#Call:
#lm(formula = y ~ x1 + x2, data = df)
#
#Residuals:
#     Min       1Q   Median       3Q      Max
#-2.50288 -0.75360 -0.01388  0.67935  3.08515
#
#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.166567   0.215534   0.773    0.441
#x1          3.496831   0.003705 943.719   <2e-16 ***
#x2                NA         NA      NA       NA
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 1.07 on 98 degrees of freedom
#Multiple R-squared:  0.9999,   Adjusted R-squared:  0.9999
#F-statistic: 8.906e+05 on 1 and 98 DF,  p-value: < 2.2e-16

You can see that estimates for x2 are NA. This is a direct consequence of x1 and x2 being linearly dependent. In other words, x2 is redundant, and the data can be described by the estimated linear model y = 3.4968 * x1 + epsilon; this is obviously in good agreement with the theoretical coefficient x1 + 2.5 * x1 = 3.5 * x1.

Wow thank you. You're right, I checked again and it is definitely redundant because the variable had a perfect correlation of 1 with another variable which I must have missed because I thought it was correlated against itself. That's embarrassing.. I should probably remove this thread.. Thanks! — Poly, Apr 01 '18 at 14:36
No problem @Poly; general SO practice is to *not* remove posts. They might be useful (and get referenced) in future questions. You should however accept the solution by setting the check-mark next to the solution to mark the question as closed. — Maurits Evers, Apr 01 '18 at 14:38

How do i fix the missing output (NA) in my summary/coefficients table R

1 Answers1