I'm using the lmtree()
function from partykit
to partition data using linear regressions. The regressions use a weight, and I want to ensure that each branch has a minimum total weight, which I specify with the minsize
option. For instance, in the following example the tree only has two branches instead of three because x1=="C"
has too small a weight to be in its own branch.
n <- 100
X <- rbind(
data.frame(TT=1:n, x1="A", weight=2, y=seq(1,l=n,by=0.2)+rnorm(n,sd=.2)),
data.frame(TT=1:n, x1="B", weight=2, y=seq(1,l=n,by=0.4)+rnorm(n,sd=.2)),
data.frame(TT=1:n, x1="C", weight=1, y=seq(1,l=n,by=0.6)+rnorm(n,sd=.2))
)
X$x1 <- factor(X$x1)
tr <- lmtree(y ~ TT | x1, data=X, weight=weight, minsize=150)
Fitted party:
[1] root
| [2] x1 in A: n = 200
| (Intercept) TT
| 0.7724903 0.2002023
| [3] x1 in B, C: n = 300
| (Intercept) TT
| 0.5759213 0.4659592
I also have some real-world data that unfortunately is confidential but is leading to some behavior that I do not understand. When I do not specify minsize
it builds a tree with 30 branches, where in every branch the total weight n
is a large number. However, when I specify a minsize
that is well below the total weight of every branch from this first tree the result is a new tree with many fewer branches. I would not have expected the tree to change at all because it seems that minsize
is not binding. Is there any explanation for this result?
UPDATE
Providing an example
n <- 100
X <- rbind(
data.frame(TT=1:n, x1=runif(n, 0.0, 0.3), weight=2, y=seq(1,l=n,by=0.2)+rnorm(n,sd=.2)),
data.frame(TT=1:n, x1=runif(n, 0.3, 0.7), weight=2, y=seq(1,l=n,by=0.4)+rnorm(n,sd=.2)),
data.frame(TT=1:n, x1=runif(n, 0.7, 1.0), weight=1, y=seq(1,l=n,by=0.6)+rnorm(n,sd=.2))
)
tr <- lmtree(y ~ TT | x1, data=X, weights = weight)
Fitted party:
[1] root
| [2] x1 <= 0.29787: n = 200
| (Intercept) TT
| 0.8431985 0.1994021
| [3] x1 > 0.29787
| | [4] x1 <= 0.69515: n = 200
| | (Intercept) TT
| | 0.6346980 0.3995678
| | [5] x1 > 0.69515: n = 100
| | (Intercept) TT
| | 0.4792462 0.5987472
Now let's set minsize=150
. The tree no longer has any splits even though x1 <= 0.3
and x1 > 0.3
would work.
tr <- lmtree(y ~ TT | x1, data=X, weights = weight, minsize=150)
Fitted party:
[1] root: n = 500
(Intercept) TT
0.6870078 0.3593374