3

I'm using ggplot2 to plot some time-series data along with a linear regression line. I'm interested to determine when the regression line will hit 82%. Visual inspection of the graph suggests that this will happen around November 15, 2017. But when I use R's predict.lm() function, I get a different answer: Aug. 12, 2017. Shouldn't these two methods give me the same answer? Ultimately, I'd like to annotate the graph with a text label that shows the intercept date.

require(ggplot2)
temp <- "End.Date    Save.Rate
1       2015-05-31     0.67
2       2015-07-31     0.67
3       2015-09-30     0.69
4       2015-11-30     0.71
5       2016-01-30     0.70
6       2016-03-31     0.72"

df <- read.table(text = temp, header = TRUE)
df$End.Date <- as.POSIXct(df$End.Date, origin="1970-01-01", tzone="America/New_York")

save.rate.lm = lm(End.Date ~ Save.Rate, data=df)
newdata <- data.frame(Save.Rate = 0.82)
temp <- predict.lm(save.rate.lm, newdata)
predicted.date <- as.POSIXct(as.data.frame(temp)[1,], origin="1970-01-01",
                             tzone="America/New_York")
print(predicted.date)

x.lims <- c(as.POSIXct(NA), as.POSIXct("2017-12-31", origin="1970-01-01",
                                       tzone="America/New_York"))
p <- ggplot(df, aes(x=End.Date, y=Save.Rate)) +
  geom_point() +
  stat_smooth(method='lm', fill=NA, fullrange=TRUE) + 
  theme(axis.text.x=element_text(angle = -45, hjust = 0)) +
  scale_y_continuous(labels = percent) +
  scale_x_datetime(breaks = date_breaks('month'), labels = date_format('%b-%Y'),
                   limits=x.lims) +
  geom_hline(yintercept=0.82)
print(p)
Alex C.
  • 65
  • 6

2 Answers2

6

You can't just invert a linear regression (i.e. date ~ 1+rate vs rate ~ 1 +date) and expect to get the same answer (e.g., see this question on CrossValidated). As far as I know there is no simple way to use predict.lm on the inverse-regression to get the answer you are looking for. You need to fit rate as a function of date and use some algebra to get the predicted date. Below I show a simple calculation that works for your specific question; the answers to this question and this question give you some additional canned solutions ...

fit2 = lm(Save.Rate ~ End.Date, data=df)
## y = a + bx
## x* = (y-a)/b
cc <- coef(fit2)
pred.date <- as.POSIXct((0.82-cc[1])/cc[2],origin="1970-01-01",
                             tzone="America/New_York")
##               (Intercept) 
## "2017-11-19 17:26:28 EST" 

The picture:

p+geom_vline(xintercept=as.numeric(pred.date),lty=2)

enter image description here

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you for the answer. I'm curious if there is a way to accomplish the same (correct) result using `predict.lm()`, like I tried to do in my original code example. – Alex C. Jun 06 '16 at 13:54
  • @AlexC. : don't think so. – Ben Bolker Jun 06 '16 at 13:58
  • 1
    Inverse prediction is a common task in analytical chemistry. Thus, there is `chemCal::inverse.predict`. It is not vectorized and doesn't preserve attributes (such as a datetime class), but it calculates standard error and confidence interval. – Roland Jun 06 '16 at 14:51
2

Ben Bolker explains why your approach doesn't work.

However, you can just flip the axes in ggplot2 with coord_flip and use a regression with error terms in x-direction (instead of the usual y-direction):

p <- ggplot(df, aes(y=End.Date, x=Save.Rate)) +
  geom_point() +
  stat_smooth(method='lm', fill=NA, fullrange=TRUE) + 
  theme(axis.text.x=element_text(angle = -45, hjust = 0)) +
  scale_y_datetime(breaks = date_breaks('month'), labels = date_format('%b-%Y'),
                   limits=x.lims) +
  geom_vline(xintercept=0.82) +
  geom_hline(yintercept = as.numeric(predicted.date)) + #to illustrate it works
  coord_flip()
print(p)

However, this is not recommended since the uncertainty of your time values is most likely much smaller than the uncertainty of your Save.Rate values. Thus, you should probably do the regression Save.Rate ~ End.Date as in your plot and do inverse prediction as shown in Ben's answer.

Roland
  • 127,288
  • 10
  • 191
  • 288