How to specify lm model matrix

Question

I have measurements obtained from 2 groups where each group has the same 3 levels. Here's my example data.frame:

df <- data.frame(measurement = c(rnorm(10,1,1),rnorm(10,0.75,1),rnorm(10,1.25,1),
                rnorm(10,0.5,1),rnorm(10,1.75,1),rnorm(10,0.25,1)),
                group = as.factor(c(rep("a",30),rep("b",30))),
                level = as.factor(rep(c(rep("x",10),rep("y",10),rep("z",10)),2)))

I'm interested in quantifying how measurement in each level is affected by group.

I guess a linear model (lm) is the appropriate approach for this, where the group:level interaction terms capture the effects I'm interested in.

Is there a way to specify an lm that will only compute these interaction terms: groupb:levelx, groupb:levely, and groupb:levelz? I believe this tells me how each level is affected by group "b" (relative to group "a"), which I think is what I'm interested in.

The closest I got is:

lm(measurement ~ 0 + group * level - group, data = df)

But that still computes the effects of levelx, levely, and levelz, which I'm not interested in.

The question seems a bit unclear. group 'b' is relative to group 'a' and vice versa since this is a binary variable. You cannot change that. You either exclude records that contain group='a' and run the regression only on group==b without using group in the model or you use the variable as it is. Then if you use lm one of the levels will be left out and all the rest will be relative to that. You also seem to introduce interactions in your lm but do you really need these? — LyzandeR, Nov 04 '15 at 16:50
Just to make clear that my answer below makes sense :-) The statistical equivalent of "quantifying how measurement in each level is affected by group", in my opinion, is : For each level x,y,z is there any difference between the (average) measurements of a and b? If there is, is it statistically significant? — AntoniosK, Nov 04 '15 at 18:05
@AntoniosK, my actual measurements are proportions, which include zeros (but not ones). Any idea what model will fit that best? (I'm aware that this is not an SO question per se hence the comment) — user1701545, Nov 04 '15 at 18:31
I guess the zeros you have mean 0% and not a flag that shows "not success", right? So, you should be able to go back before you calculated those percentages and have as values 1/0 (success/not success; binary variable) and apply a logistic regression model. — AntoniosK, Nov 04 '15 at 19:00

AntoniosK · Accepted Answer · 2015-11-04T17:50:45.950

As @Lyzander mentioned above you should clarify a bit more what you want. Based on what you said "how measurement is affected by group "b" (relative to group "a") for each level", I guess there are 3 simple ways to do that.

df <- data.frame(measurement = c(rnorm(10,1,1),rnorm(10,0.75,1),rnorm(10,1.25,1),
                                 rnorm(10,0.5,1),rnorm(10,1.75,1),rnorm(10,0.25,1)),
                 group = as.factor(c(rep("a",30),rep("b",30))),
                 level = as.factor(rep(c(rep("x",10),rep("y",10),rep("z",10)),2)))


library(dplyr)

#### calculate stats (mean values) ---------------------------------------------
df %>% group_by(level, group) %>% summarise(MeanMeasurement = mean(measurement))

#     level  group MeanMeasurement
#    (fctr) (fctr)           (dbl)
# 1      x      a       1.6708659
# 2      x      b       0.8487751
# 3      y      a       0.7977769
# 4      y      b       1.4209206
# 5      z      a       1.5484668
# 6      z      b      -0.3244225

#### build a model for each level ---------------------------------------------
summary(lm(measurement ~ group  , data = df[df$level=="x",]))

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   1.6709     0.3174   5.264 5.27e-05 ***
#   groupb       -0.8221     0.4489  -1.831   0.0837 .


summary(lm(measurement ~ group  , data = df[df$level=="y",]))

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)   
# (Intercept)   0.7978     0.2565   3.111  0.00604 **
#   groupb        0.6231     0.3627   1.718  0.10295


summary(lm(measurement ~ group  , data = df[df$level=="z",]))

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   1.5485     0.3549   4.363 0.000375 ***
#   groupb       -1.8729     0.5019  -3.731 0.001528 **



## build a model only with interactions ------------------------------------------
summary(lm(measurement ~ group : level , data = df))

# Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.3244     0.3123  -1.039 0.303452    
# groupa:levelx   1.9953     0.4416   4.518 3.43e-05 ***
#   groupb:levelx   1.1732     0.4416   2.657 0.010354 *  
#   groupa:levely   1.1222     0.4416   2.541 0.013951 *  
#   groupb:levely   1.7453     0.4416   3.952 0.000227 ***
#   groupa:levelz   1.8729     0.4416   4.241 8.76e-05 ***
#   groupb:levelz       NA         NA      NA       NA

If you check the stats (1st approach) and the models' coefficients you'll see that all those 3 approaches agree with each other.

I'd go with the 2nd approach as it is the only one that gives you info about whether a difference of group (a vs b) within a level is statistically significant or not. 1st approach just reports means. 3rd approach includes p values but they correspond to comparisons with a baseline interaction value and not to comparisons between groups a and b.

You mentioned "ONLY compute these interaction terms: groupb:levelx, groupb:levely, and groupb:levelz" which means that you won't get the other 3 interactions of a and x,y,z. In other words you force your model to include those 3 interactions.

You can do that manually like this

df <- data.frame(measurement = c(rnorm(10,1,1),rnorm(10,0.75,1),rnorm(10,1.25,1),
                                 rnorm(10,0.5,1),rnorm(10,1.75,1),rnorm(10,0.25,1)),
                 group = as.factor(c(rep("a",30),rep("b",30))),
                 level = as.factor(rep(c(rep("x",10),rep("y",10),rep("z",10)),2)))

library(dplyr)

df %>%
  mutate(interactions = paste0(group,":",level),
         interactions = ifelse(group=="a","a",interactions)) -> df2

summary(lm(measurement ~ interactions, data = df2))

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)       0.9318     0.1831   5.089 4.36e-06 ***
#   interactionsb:x  -0.7803     0.3662  -2.131  0.03752 *  
#   interactionsb:y   0.2747     0.3662   0.750  0.45638    
# interactionsb:z  -1.1367     0.3662  -3.104  0.00299 **

but now the other 3 interactions are combined and every time you compare each one of your 3 interactions (b:x, b:y, b:z) with the general group a. You don't compare a vs. b within x,y and z but you compare x vs. y vs. z within group b.

You might consider the model with `measurement ~ level + group:level`, I think this highlights the contrast OP is asking for. — Gregor Thomas, Nov 04 '15 at 17:27
@Gregor I thought about this point of view, but I got confused and that's why I asked for clarifications. Not sure if he wants `levely`, `levelz` and `levelx` (baseline) that this model includes. — AntoniosK, Nov 04 '15 at 17:32
Thanks a lot for the discussion. I hope my updated post clarifies what I'm interested in. As @AntoniosK noted, measurement ~ level + group:level, which is equivalent to measurement ~ 0 + group * level - group, still computes me levely, levelz and levelx, which I'm not interested in. — user1701545, Nov 04 '15 at 17:41
@user1701545 I think what you ask is not possible as the interpretation is not what you want/describe. When there are 6 interactions and you want your model to include 3 of those, then the other 3 should be combined together and create the base line. I'll update my answer to explain a bit. — AntoniosK, Nov 04 '15 at 17:46

score 0 · Answer 2 · answered Nov 04 '15 at 17:42

0

Based on this sentence: "Is there a way to specify an lm that will only compute these interaction terms: groupb:levelx, groupb:levely, and groupb:levelz?", I think you just want:

lm( measurement ~ level +0, subset = group=="b", data = df)

answered Nov 04 '15 at 17:42

IRTFM

258,963
21
364
487

How to specify lm model matrix

2 Answers2

Linked