2

Given the following dataset:

csf     age    sex  tiv   group
0,30    7,92    1   1,66    1
0,26    33,75   0   1,27    3
0,18    7,83    0   1,43    2
0,20    9,42    0   1,70    1
0,29    22,33   1   1,68    2
0,40    20,75   1   1,56    1
0,26    13,25   0   1,68    1
0,28    6,67    0   1,66    1
0,22    10,58   0   1,38    1
0,22    13,08   0   1,41    2
0,33    36,42   1   1,68    3
0,29    35,00   1   1,34    3
0,11    7,25    1   1,20    2
0,13    10,00   0   1,12    3
0,32    34,58   1   1,33    3
0,68    8,25    1   1,90    1
0,25    11,08   1   1,92    2
0,33    10,92   0   1,24    1
0,20    9,33    1   1,58    1
0,25    51,67   0   1,15    3
0,16    27,67   0   1,19    3
0,19    33,25   0   1,29    3
0,16    7,92    1   1,67    1
0,17    13,42   0   1,34    3
0,45    48      1   1,85    1
0,34    14,67   1   1,80    1
0,23    35,33   0   1,31    3
0,18    15,50   1   1,59    1
0,11    12,08   0   1,34    2
0,21    9,92    0   1,43    1
0,19    8,83    0   1,59    1
0,21    6,83    1   1,78    1
0,13    10      0   1,28    1
0,38    38,42   1   1,63    3
0,27    13,83   0   1,63    1
0,28    15,33   0   1,43    2
0,31    38      1   1,70    1
0,19    13,08   0   1,56    1
0,13    26,25   0   1,07    3
0,14    63,08   1   1,34    3
0,19    10,25   1   1,27    3
0,38    37,25   1   1,63    3
0,28    37,33   0   1,47    3
0,34    20,25   1   1,41    2
0,36    40,33   1   1,44    3
0,26    42,83   0   1,43    2
0,29    46,08   1   1,74    2
0,19    10,25   0   1,56    1
0,20    12,08   1   1,76    1
0,29    30,58   1   1,39    3
0,23    44,67   1   1,45    3

I want to know whether CSF is different between groups. But I know that CSF is highly affected by age, sex, and tiv. So, I would like to plot the differences between groups beyond the influence of age, sex, and tiv. To that end, I need to adjust CSF for those three covariates. My question is: how can I obtain, for each individual, his/her adjusted CSF value?

I did the following linear model:

model1 <- lm(csf ~ age + sex + tiv,data=mri22))

And used the sum of (residuals+intercept) in order to obtain the csf value free from the effects of age, sex, and tiv:

csf_adj <- resid(model1) + coef(model1)[1]

However, I get many negative values that make no sense, given that CSF cannot be negative. So my question is: how can I obtain the good CSF values adjusted for all three covariates?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Borja
  • 63
  • 2
  • 6
  • By definition, residuals of a linear model add up to 0. Unless every observation falls on the regression line, then some residuals will be negative. – lmo Apr 13 '16 at 13:32
  • Take a look at `?predict`. – lmo Apr 13 '16 at 13:38
  • Thanks @lmo These negative residuals should however turn positive once we sum the intercept. – Borja Apr 13 '16 at 13:42
  • Sorry. I missed the adding of the intercept. Because you have a linear model, and because your dependent values are so close to 0, predicted / adjusted values may be negative. A simple solution is to call this "adjusted CSF" that takes on negative values. Otherwise, you might consider some non-linear model. – lmo Apr 13 '16 at 13:52
  • Testing several fits, the linear one is the one explaining more variance. But you're right, the dependent variable is really close to 0 and this may be a problem. Still, I am not sure these are the correct adjusted CSF values. – Borja Apr 13 '16 at 14:38

3 Answers3

1

As @Gopala said, apparently there is no effect of group in the intercept. Also there is no effect on the responses (coefficients). You can see this in plots and statistical tests.

mri22$group <- as.factor(mri22$group)
plot(mri22)
plot(csf~group,data=mri22,col=mri22$group)

plot(csf~age,data=mri22,col=mri22$group)
plot(csf~sex,data=mri22,col=mri22$group)
plot(csf~tiv,data=mri22,col=mri22$group)

model1 <- lm(csf ~ age + sex + tiv,data=mri22)
summary(model1)

model2 <- lm(csf ~ 0+age + sex + tiv+group,data=mri22)
summary(model2)
model3 <- lm(csf ~ 0+age*I(group) + sex + tiv,data=mri22)
summary(model3)
model4 <- lm(csf ~ 0+age*I(group) + sex*I(group) + tiv*I(group),data=mri22)
summary(model4)


Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
age            0.0025507  0.0020500   1.244   0.2208  
I(group)1     -0.1902470  0.2174566  -0.875   0.3870  
I(group)2     -0.0076027  0.2224419  -0.034   0.9729  
I(group)3     -0.2303957  0.1993927  -1.155   0.2549  
sex            0.0208069  0.0480609   0.433   0.6675  
tiv            0.2552315  0.1428288   1.787   0.0817 .
age:I(group)2 -0.0002252  0.0030392  -0.074   0.9413  
age:I(group)3 -0.0021075  0.0026656  -0.791   0.4339  
I(group)2:sex -0.0048219  0.0790885  -0.061   0.9517  
I(group)3:sex -0.0014738  0.0711362  -0.021   0.9836  
I(group)2:tiv -0.1307945  0.2153850  -0.607   0.5472  
I(group)3:tiv  0.0796898  0.2143078   0.372   0.7120  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robert
  • 5,038
  • 1
  • 25
  • 43
  • Thanks @Gopala. In fact I just posted part of the dataset (it's a huge one), because I was more interested on how to get the adjusted CSF values rather than checking their statistical significance. The full dataset gives actually a group effect. But even if it's non-significant, how can I extract the adjusted CSF values? – Borja Apr 13 '16 at 14:33
  • For any model is just `model4$fitted.values` – Robert Apr 14 '16 at 22:55
  • Actually the answer is the "opposite", what you need are not the fitted values but the residuals: model1$residuals – michael Oct 10 '17 at 14:59
0

You can run a regression like this and it will tell you whether group is significant. Here, it shows that it is not:

df$group <- as.factor(df$group)
fit <- lm(csf ~ age + sex + tiv + group, data = df)
summary(fit)

Call:
lm(formula = csf ~ age + sex + tiv + group, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12429 -0.04760 -0.00306  0.01967  0.34004 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.155845   0.147860  -1.054   0.3000  
age          0.002895   0.001539   1.881   0.0694 .
sex          0.019926   0.036502   0.546   0.5891  
tiv          0.237891   0.097655   2.436   0.0208 *
group2      -0.037555   0.040104  -0.936   0.3563  
group3      -0.013844   0.051717  -0.268   0.7907  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0874 on 31 degrees of freedom
Multiple R-squared:  0.4342,    Adjusted R-squared:  0.3429 
F-statistic: 4.757 on 5 and 31 DF,  p-value: 0.00243
Gopala
  • 10,363
  • 7
  • 45
  • 77
  • Thanks @Gopala. The full dataset gives actually a group effect. But even if it's non-significant, how can I extract the individual adjusted CSF values? – Borja Apr 13 '16 at 15:04
  • You can extract from above using `fit$fitted.values[df$group == 1]` and such for different group's fitted values of `csf`. – Gopala Apr 13 '16 at 15:35
  • Thanks for the reply. However, those (fitted) values are contaminated by the effects of age, sex and tiv. They highly correlate with the covariates. The adjusted CSF values that I aim to obtain should be clean of the covariate's influence, so they must not correlate with them. – Borja Apr 13 '16 at 15:54
  • In that case, I have no idea what you are after. Sorry. – Gopala Apr 13 '16 at 16:02
  • 1
    @Borja What you are after are the residuals: fit$residuals. Not sure why everyone is suggesting to use the fitted values which include all the effects and is the opposite of what you are asking. – michael Oct 10 '17 at 14:56
0

Although it's too late..

Your model is csf depends linearly on age, sex and tiv. This should explain some percentage of variance of data. Remaining percentage of variance will be in residuals.

Csf = a.age + b.sex + c.tiv + d is the model. If r is the residual, then, Predicted csf based on model is a.age + b.sex + c.tiv + d, while observed csf (data you have) is a.age + b.sex + c.tiv + d + r.

Now if you want to control for age, sex and tiv replace the individual with their corresponding means. For example, Adjusted icv = a.(mean of age) + b.(mean of sex) + c.(mean of tiv) + d + r.

Now this adjusted csf will have variation due to anything other than age, sex or tiv.