If you want to estimate a different slope in each level of age
, the you can use the %in%
operator in the formula
set.seed(1)
df <- data.frame(age = factor(sample(1:4, 100, replace = TRUE)),
v1269 = rlnorm(100),
year = rnorm(100))
m <- lm(year ~ log(v1269) %in% age, data = df)
summary(m)
This gives (for this entirely random , dummy, silly data set)
> summary(m)
Call:
lm(formula = year ~ log(v1269) %in% age, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.93108 -0.66402 -0.05921 0.68040 2.25244
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02692 0.10705 0.251 0.802
log(v1269):age1 0.20127 0.21178 0.950 0.344
log(v1269):age2 -0.01431 0.24116 -0.059 0.953
log(v1269):age3 -0.02588 0.24435 -0.106 0.916
log(v1269):age4 0.06019 0.21979 0.274 0.785
Residual standard error: 1.065 on 95 degrees of freedom
Multiple R-squared: 0.01037, Adjusted R-squared: -0.0313
F-statistic: 0.2489 on 4 and 95 DF, p-value: 0.9097
Note that this fits a single constant term plus 4 different effects of log(v1269)
, one per level of age
. Visually, this is sort of what the model is doing
pred <- with(df,
expand.grid(age = factor(1:4),
v1269 = seq(min(v1269), max(v1269), length = 100)))
pred <- transform(pred, fitted = predict(m, newdata = pred))
library("ggplot2")
ggplot(df, aes(x = log(v1269), y = year, colour = age)) +
geom_point() +
geom_line(data = pred, mapping = aes(y = fitted)) +
theme_bw() + theme(legend.position = "top")

Clearly, this would only be suitable if there was no significant difference in the mean values of year
(the response) in the different age categories.
Note that a different parameterisation of the same model can be achieved via the /
operator:
m2 <- lm(year ~ log(v1269)/age, data = df)
> m2
Call:
lm(formula = year ~ log(v1269)/age, data = df)
Coefficients:
(Intercept) log(v1269) log(v1269):age2 log(v1269):age3
0.02692 0.20127 -0.21559 -0.22715
log(v1269):age4
-0.14108
Note that now, the first log(v1269)
term is for the slope for age == 1
, whilst the other terms are the adjustments required to be applied to the the log(v1269)
term to get the slope for the indicated group:
coef(m)[-1]
coef(m2)[2] + c(0, coef(m2)[-(1:2)])
> coef(m)[-1]
log(v1269):age1 log(v1269):age2 log(v1269):age3 log(v1269):age4
0.20127109 -0.01431491 -0.02588106 0.06018802
> coef(m2)[2] + c(0, coef(m2)[-(1:2)])
log(v1269):age2 log(v1269):age3 log(v1269):age4
0.20127109 -0.01431491 -0.02588106 0.06018802
But they work out to the same estimated slopes.