7

While looking at this issue, I couldn't specify a custom linear model to geom_smooth. My code is as follows:

example.label <- c("A","A","A","A","A","B","B","B","B","B")
example.value <- c(5, 4, 4, 5, 3, 8, 9, 11, 10, 9)
example.age <- c(30, 40, 50, 60, 70, 30, 40, 50, 60, 70)
example.score <- c(90,95,89,91,85,83,88,94,83,90)
example.data <- data.frame(example.label, example.value,example.age,example.score)

p = ggplot(example.data, aes(x=example.age,
                         y=example.value,color=example.label)) +
  geom_point()
  #geom_smooth(method = lm)

cf = function(dt){
  lm(example.value ~example.age+example.score, data = dt)
}

cf(example.data)

p_smooth <- by(example.data, example.data$example.label, 
               function(x) geom_smooth(data=x, method = lm, formula = cf(x)))

p + p_smooth 

I am getting this error/warning:

Warning messages:
1: Computation failed in `stat_smooth()`:
object 'weight' not found 
2: Computation failed in `stat_smooth()`:
object 'weight' not found 

Why am I getting this? And what is the proper method of specifying a custom model to geom_smooth. Thanks.

eipi10
  • 91,525
  • 24
  • 209
  • 285
AK88
  • 2,946
  • 2
  • 12
  • 31
  • 3
    I think the model formula for `geom_smooth` can only be of the form `y ~ f(x)` (e.g., `y ~ x`, which is the default, or `y ~ poly(x, 2)`, `y ~ bs(x, df=4)`, etc.). The formula can only include x and y and they will represent whatever data columns you've passed to `aes` as the x and y aesthetics of the plot. If you want to plot the results of a multivariate regression, you can build a data frame to feed to `predict` and then plot the output using, for example, `geom_line`. – eipi10 Jun 27 '17 at 05:33
  • Do you it worth a try to reshape the data, then use a single column for `x` and `subset` function? – AK88 Jun 27 '17 at 05:37
  • It's not really a matter of reshaping. All you can really do with `geom_smooth` is plot regressions that include the variables used for your y and x aesthetics plus any categorical aesthetics (e.g., color, shape). For example, in your case, you could do `p + geom_smooth(method=lm, formula=y ~ poly(x, 2))` or `p + geom_smooth(method=lm, formula=y ~ splines::bs(x, df=4))` (though you'll need more data for that one to make sense). These will give you separate regression lines for each level of `example.label`. – eipi10 Jun 27 '17 at 05:42
  • For example, with the `mtcars` data frame, here are regression lines for `mpg` vs. `wt`, but categorized by levels of `vs` and `am`: `ggplot(mtcars, aes(wt, mpg, colour=interaction(vs, am, sep="_"))) + geom_point() + geom_smooth(se=FALSE, method="lm")`. – eipi10 Jun 27 '17 at 05:47
  • I see, thanks. Looks like there is no workaround ... – AK88 Jun 27 '17 at 06:07
  • I've added an answer to expand on the issues discussed in the comments. – eipi10 Jun 27 '17 at 16:37

1 Answers1

3

The regression function for a regression model with two continuous predictor variables and a continuous outcome lives in a 3D space (two for the predictors, one for the outcome), while the ggplot graph is a 2D space (one continuous predictor on the x-axis and the outcome on the y-axis). That's the fundamental reason why you can't plot a function of two continuous predictor variables with geom_smooth.

One "workaround" is to pick a few specific values of one of the continuous predictor variables and then plot a line for the other continuous predictor on the x-axis for each of the chosen values of the first variable.

Here's an example with the mtcars data frame. The regression model below predicts mpg using wt and hp. We then plot the predictions of mpg vs. wt for various values of hp. We create a data frame of predictions and then plot using geom_line. Each line in the graph represents the regression prediction for mpg vs. wt for different values of hp. You can, of course, also reverse the roles of wt and hp.

library(ggplot)
theme_set(theme_classic())

d = mtcars
m2 = lm(mpg ~ wt + hp, data=d)

pred.data = expand.grid(wt = seq(min(d$wt), max(d$wt), length=20),
                        hp = quantile(d$hp))
pred.data$mpg = predict(m2, newdata=pred.data)

ggplot(pred.data, aes(wt, mpg, colour=factor(hp))) +
  geom_line() +
  labs(colour="HP Quantiles")

enter image description here

Another option is to use a colour gradient to represent mpg (the outcome) and plot wt and hp on the x and y axes:

pred.data = expand.grid(wt = seq(min(d$wt), max(d$wt), length=100),
                        hp = seq(min(d$hp), max(d$hp), length=100))
pred.data$mpg = predict(m2, newdata=pred.data)

ggplot(pred.data, aes(wt, hp, z=mpg, fill=mpg)) +
  geom_tile() +
  scale_fill_gradient2(low="red", mid="yellow", high="blue", midpoint=median(pred.data$mpg)) 

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285