3

I was wondering if it is correct to say that a model-based recursive partitioning model (mob, package partykit) is of the family of the mixed-effect models.

My point is that a mixed effect model provides different parameters for each random effect and this is also what does a mob model. The main difference I see is that a mob partitions itself the random effects.

Here is an example:

library(partykit); library(lme4)
set.seed(321)

##### Random data
V1 <- runif(100); V2 <- sample(1:3, 100, replace=T)
V3 <- jitter(ifelse(V2 == 1, 2*V1+3, ifelse(V2==2, -1*V1+2, V1)), amount=.2) 

##### Mixed-effect model
me <- lmer(V3 ~ V1 + (1 + V1|V2))
coef(me) #linear model coefficients from the mixel effect model

#$V2
#  (Intercept)         V1
#1  2.99960082  1.9794378
#2  1.96874586 -0.8992926
#3  0.01520725  1.0255424

##### MOB
fit <- function(y, x, start = NULL, weights = NULL, offset = NULL)  lm(y ~ x)
mo <- mob(V3 ~ V1|V2, fit=fit) #equivalent to lmtree
coef(mo) #linear model (same) coefficients from the mob

#      (Intercept) x(Intercept)        xV1
#2  2.99928854           NA  1.9804084
#4  1.97185661           NA -0.9047805
#5  0.01333292           NA  1.0288309
Ekatef
  • 1,061
  • 1
  • 9
  • 12
MassCorr
  • 349
  • 1
  • 8

1 Answers1

2

No, the kind of linear regression-based MOB (lmtree) is not a mixed-effects type of model. However, you used the MOB tree to estimate an interaction model (or nested effect) and indeed mixed-effects models can also be used to do so.

Your data-generating process implements a different intercept and V1 slope for every level of V2. If this interaction is known it can be easily recovered by a suitable linear regression with interaction effect (but V2 should be a categorical factor variable for this).

V2 <- factor(V2)
mi <- lm(V3 ~ 0 + V2 / V1)
matrix(coef(mi), ncol = 2)
##            [,1]       [,2]
## [1,] 2.99928854  1.9804084
## [2,] 1.97185661 -0.9047805
## [3,] 0.01333292  1.0288309

Note that the model fit is equivalent to lm(V3 ~ V1 * V2) but uses a different contrast coding for the coefficients.

The estimates obtained above are exactly identical to th lmtree() output (or manually using mob() + lm() as you did in your post):

coef(lmtree(V3 ~ V1 | V2))
##   (Intercept)         V1
## 2  2.99928854  1.9804084
## 4  1.97185661 -0.9047805
## 5  0.01333292  1.0288309

The main difference is that you had to tell lm() exactly which interaction to consider. lmtree(), on the other hand, "learned" the interaction in a data-driven way. Admittedly, in this case there is not so much to learn...but lmtree() could have decided without any split or with two splits instead of performing all possible splits.

Finally, your lmer(V3 ~ V1 + (1 + V1 | V2)) specification also estimates a nested (or interaction) effect. However, it uses a different estimation technology with random effects instead of full fixed effects. Also, here you have to prespecify the interaction.

In short: lmtree() can be considered as a way to find interaction effects in a data-driven way. But these interactions are not estimated with random effects, hence not a mixed-effects model.

P.S.: It is possible to combine lmtree() and lmer() but that's a different story. If you are interested see package https://CRAN.R-project.org/package=glmertree and the accomanying paper.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • Thanks a lot. This is perfectly clear. Is there any reason to trust more the lmer estimations of coefficient over the lmtree estimations? Rather than combining lmtree with a nested lmer as does the glmetree package, is it possible to combine a lmtree with another a nested lmtree (eg something like 2lmtree(V3~V1|V2|V4))? – MassCorr Mar 09 '18 at 10:37
  • The random-effects estimation of the interactions with `lmer()` is preferable if you know the grouping factor but it has many levels. Fixed-effects estimation with `lm()` is preferable if the grouping is known but has relatively few levels (compared to the sample size). `lmtree()` has the advantage that it can discover groups in a data-driven way. As for the nested tree, I'm not sure what you mean. Using `lmtree(V3 ~ V1 | V2 + V4)` will determine recursively whether to use `V2` or `V4` or both for grouping. – Achim Zeileis Mar 09 '18 at 20:34
  • Thanks a million for sharing your knowledge ! It is very helpful. Regarding my 2nd question, I will open a new post to illustrate it – MassCorr Mar 12 '18 at 10:10