4

I am a beginner at multilevel analysis and try to understand how I can do graphs with the plot functions from base-R. I understand the output of fit below but I am struggeling with the visualization. df is just some simple test data:

t <- seq(0, 10, 1)
df <- data.frame(t = t,
                 y = 1.5+0.5*(-1)^t + (1.5+0.5*(-1)^t) * t,
                 p1 = as.factor(rep(c("p1", "p2"), 10)[1:11]))

fit <- lm(y ~ t * p1, data = df)

# I am looking for an automated version of that:
plot(df$t, df$y)
lines(df$t[df$p1 == "p1"], 
      fit$coefficients[1] + fit$coefficients[2] * df$t[df$p1 == "p1"], col = "blue")
lines(df$t[df$p1 == "p2"], 
      fit$coefficients[1] + fit$coefficients[2] * df$t[df$p1 == "p2"] + 
        + fit$coefficients[3] + fit$coefficients[4] * df$t[df$p1 == "p2"], col = "red")

It should know that it has to include p1 and that there are two lines.
The result should look like this: Fit with two levels (interactions needs to be included)

Edit: Predict est <- predict(fit, newx = t) gives the same result as fit but still I don't know "how to cluster".

Edit 2 @Keith: The formula y ~ t * p1 reads y = (a + c * p1) + (b + d * p1) * t. For the "first blue line" c, d are both zero.

Christoph
  • 6,841
  • 4
  • 37
  • 89
  • Will `t` and `y` always be the same? and will you always have a single factor column? – Keith Hughitt Dec 14 '16 at 13:30
  • A better way of obtaining fitted values would be through `predict()`. Pass on a data.frame as `newdata` for which to predict and be amazed. – Roman Luštrik Dec 14 '16 at 13:39
  • @KeithHughitt No, my real problem will be much more complicated, for a simplified example look [here](http://stackoverflow.com/questions/40765869/analysis-using-linear-regression-based-on-subgroups). The main purpose of this question is really "how to get the graph with the correct number of lines". Even in this simple case (`p1` only has 2 levels!) I fail;-) – Christoph Dec 14 '16 at 13:42
  • @RomanLuštrik See my Edit. I know about `predict` (Although I don't understand why it is better than `lm`), but the question here is how to get the "correct number of lines". See also the comment above. It might be a stupid question, but I couldn't figure it out. – Christoph Dec 14 '16 at 13:44
  • What is 'to cluster'? There is no such argument as `newx`. I don't understand what's wrong with the number of lines here? Number of lines should be as many as there are factor levels in `p1`, am I right? – Roman Luštrik Dec 14 '16 at 13:50
  • @RomanLuštrik You are right. In this case there are 2 lines, although there might be situations with less (see e.g. [here](http://stackoverflow.com/questions/40765869/analysis-using-linear-regression-based-on-subgroups)). My question is: How can I plot the 2 fit-lines? I hope the explanation helps. – Christoph Dec 14 '16 at 13:58
  • My first thought would be to use a for-loop to iterate over `levels(df$p1)` and make a new `line()` for each factor level. If you aren't sure which column is the categorical one, you can also use `sapply(df, is.factor)` to figure that out as well. – Keith Hughitt Dec 14 '16 at 13:59
  • Also, are you missing `coefficients[3]` and `coefficients[4]` in your first call to `line()` above? – Keith Hughitt Dec 14 '16 at 14:00
  • @KeithHughitt I don't think I have to include them: My (poor) understanding is that the coefs are ordered according to the formula. Thus, the "first line" corresponds to "not include coefs 3 and 4". See my edit. – Christoph Dec 14 '16 at 14:06

1 Answers1

4

This is how I would do it. I'm including a ggplot2 version of plot as well because I find it better fitted for the way I think about plots. This version will account for the number of levels in p1. If you want to compensate for the number of model parameters, you will just have to adjust the way you construct xy to include all the relevant variables. I should point out that if you omit the newdata argument, fitting will be done on the dataset provided to lm.

t <- seq(0, 10, 1)
df <- data.frame(t = t,
                 y = 1.5+0.5*(-1)^t + (1.5+0.5*(-1)^t) * t,
                 p1 = as.factor(rep(c("p1", "p2"), 10)[1:11]))

fit <- lm(y ~ t * p1, data = df)

xy <- data.frame(t = t, p1 = rep(levels(df$p1), each = length(t)))
xy$fitted <- predict(fit, newdata = xy)

library(RColorBrewer) # for colors, you can define your own
cols <- brewer.pal(n = length(levels(df$p1)), name = "Set1") # feel free to ignore the warning

plot(x = df$t, y = df$y)
for (i in 1:length(levels(xy$p1))) {
  tmp <- xy[xy$p1 == levels(xy$p1)[i], ]
  lines(x = tmp$t, y = tmp$fitted, col = cols[i])
}

enter image description here

library(ggplot2)
ggplot(xy, aes(x = t, y = fitted, color = p1)) +
  theme_bw() +
  geom_point(data = df, aes(x = t, y = y)) +
  geom_line()

enter image description here

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197