0

This is my first time to ask here.

I have trouble generating the slope dummy variables only(without intercept dummy). However, if I multiply dummy variable by independent variable as shown below, both slope dummy and intercept dummy results are represented.

I want to incorporate slope dummy only and exclude intercept dummy.

I will appreciate your help. Bests, yjkim

reg <- lm(year ~ as.factor(age)*log(v1269)) 
Call: 
lm(formula = year ~ as.factor(age) * log(v1269)) 

Residuals: 
   Min     1Q Median     3Q    Max 
-6.083 -1.177  1.268  1.546  3.768 

Coefficients: 
                            Estimate Std. Error t value Pr(>|t|)   
(Intercept)                 5.18076    2.16089   2.398   0.0167 * 
as.factor(age)2             1.93989    2.75892   0.703   0.4821   
as.factor(age)3             2.46861    2.39393   1.031   0.3027   
as.factor(age)4            -0.56274    2.30123  -0.245   0.8069   
log(v1269)                 -0.06788    0.23606  -0.288   0.7737   
as.factor(age)2:log(v1269) -0.15628    0.29621  -0.528   0.5979   
as.factor(age)3:log(v1269) -0.14961    0.25809  -0.580   0.5622   
as.factor(age)4:log(v1269)  0.16534    0.24884   0.664   0.5065   
yjkim
  • 1
  • 1
  • Do you want to get rid of the `(Intercept)` term or the three `as.factor(age)2`, `as.factor(age)3`, and `as.factor(age)4` terms? – Gavin Simpson Apr 14 '16 at 19:10

2 Answers2

0

Just need a -1 within the formaula

reg <- lm(year ~ as.factor(age)*log(v1269) -1) 
Adam Birenbaum
  • 940
  • 9
  • 23
0

If you want to estimate a different slope in each level of age, the you can use the %in% operator in the formula

set.seed(1)
df <- data.frame(age = factor(sample(1:4, 100, replace = TRUE)),
                 v1269 = rlnorm(100),
                 year = rnorm(100))

m <- lm(year ~ log(v1269) %in% age, data = df)
summary(m)

This gives (for this entirely random , dummy, silly data set)

> summary(m)

Call:
lm(formula = year ~ log(v1269) %in% age, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.93108 -0.66402 -0.05921  0.68040  2.25244 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)      0.02692    0.10705   0.251    0.802
log(v1269):age1  0.20127    0.21178   0.950    0.344
log(v1269):age2 -0.01431    0.24116  -0.059    0.953
log(v1269):age3 -0.02588    0.24435  -0.106    0.916
log(v1269):age4  0.06019    0.21979   0.274    0.785

Residual standard error: 1.065 on 95 degrees of freedom
Multiple R-squared:  0.01037,   Adjusted R-squared:  -0.0313 
F-statistic: 0.2489 on 4 and 95 DF,  p-value: 0.9097

Note that this fits a single constant term plus 4 different effects of log(v1269), one per level of age. Visually, this is sort of what the model is doing

pred <- with(df,
             expand.grid(age = factor(1:4),
                         v1269 = seq(min(v1269), max(v1269), length = 100)))
pred <- transform(pred, fitted = predict(m, newdata = pred))

library("ggplot2")
ggplot(df, aes(x = log(v1269), y = year, colour = age)) + 
  geom_point() +
  geom_line(data = pred, mapping = aes(y = fitted)) +
  theme_bw() + theme(legend.position = "top")

Simulated data plus fitted slopes from the nested slope model described in the answer

Clearly, this would only be suitable if there was no significant difference in the mean values of year (the response) in the different age categories.

Note that a different parameterisation of the same model can be achieved via the / operator:

m2 <- lm(year ~ log(v1269)/age, data = df)

> m2

Call:
lm(formula = year ~ log(v1269)/age, data = df)

Coefficients:
    (Intercept)       log(v1269)  log(v1269):age2  log(v1269):age3  
        0.02692          0.20127         -0.21559         -0.22715  
log(v1269):age4  
       -0.14108

Note that now, the first log(v1269) term is for the slope for age == 1, whilst the other terms are the adjustments required to be applied to the the log(v1269) term to get the slope for the indicated group:

coef(m)[-1]
coef(m2)[2] + c(0, coef(m2)[-(1:2)])

> coef(m)[-1]
log(v1269):age1 log(v1269):age2 log(v1269):age3 log(v1269):age4 
     0.20127109     -0.01431491     -0.02588106      0.06018802 
> coef(m2)[2] + c(0, coef(m2)[-(1:2)])
                log(v1269):age2 log(v1269):age3 log(v1269):age4 
     0.20127109     -0.01431491     -0.02588106      0.06018802

But they work out to the same estimated slopes.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453