5

I can create simple graphs. I would like to have observed and predicted values (from a linear regression) on the same graph. I am plotting say Yvariable vs Xvariable. There is only 1 predictor and only 1 response. How could I also add linear regression curve to the same graph?

So to conclude need help with:

  • plotting actuals and predicted both
  • plotting regression line
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
user2543622
  • 5,760
  • 25
  • 91
  • 159

2 Answers2

14

Here is one option for the observed and predicted values in a single plot as points. It is easier to get the regression line on the observed points, which I illustrate second

First some dummy data

set.seed(1)
x <- runif(50)
y <- 2.5 + (3 * x) + rnorm(50, mean = 2.5, sd = 2)
dat <- data.frame(x = x, y = y)

Fit our model

mod <- lm(y ~ x, data = dat)

Combine the model output and observed x into a single object for plott

res <- stack(data.frame(Observed = dat$y, Predicted = fitted(mod)))
res <- cbind(res, x = rep(dat$x, 2))
head(res)

Load lattice and plot

require("lattice")

xyplot(values ~ x, data = res, group = ind, auto.key = TRUE)

The resulting plot should look similar to this

enter image description here

To get just the regression line on the observed data, and the regression model is a simple straight line model as per the one I show then you can circumvent most of this and just plot using

xyplot(y ~ x, data = dat, type = c("p","r"), col.line = "red")

(i.e. you don't even need to fit the model or make new data for plotting)

The resulting plot should look like this

enter image description here

An alternative to the first example which can be used with anything that will give coefficients for the regression line is to write your own panel functions - not as scary as it seems

xyplot(y ~ x, data = dat, col.line = "red",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.abline(coef = coef(mod), ...) ## using mod from earlier
       }
      )

That gives a plot from Figure 2 above, but by hand.

Assuming you've done this with caret then

mod <- train(y ~ x, data = dat, method = "lm",
             trControl = trainControl(method = "cv"))

xyplot(y ~ x, data = dat, col.line = "red",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret
       }
      )

Will produce a plot the same as Figure 2 above.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • I thought your earlier answer to an identical question was better: http://stackoverflow.com/questions/12972039/plotting-xyplot-with-regression-line-on-lattice-graphics?rq=1 – IRTFM Jul 02 '13 at 17:55
  • I thought your residuals looked weird (it looks like there are just two different lines the observed values can fall on) and then I realized you messed up the call to `rnorm` when creating y. – Dason Jul 02 '13 at 17:57
  • +1 -- Nice demo. (Following up on @Dason' comment, your `rnorm(2.5, sd=2)` evaluates to the same thing as `rnorm(n=2, sd=2)`, with the value then getting recycled out to length 50. You probably wanted `rnorm(50, sd=2)` instead.) – Josh O'Brien Jul 02 '13 at 18:21
  • Oops. That was supposed to be mean 2. Will fix! – Gavin Simpson Jul 02 '13 at 19:51
  • but what if i have created a regression model using caret package>>Test function? i want to plot regression line using those estimates as it fits regression line using 10 fold cross validation – user2543622 Jul 02 '13 at 20:22
  • 1
    @user2543622 I don't mean to be rude, but why don't you mention this when you first ask the question? I'll update my answer... *sigh* – Gavin Simpson Jul 02 '13 at 20:39
  • sorry, i understand your frustration...but this is my first post on stackoverflow. i forgot to add test function part in the original question but i posted a comment and put this new request their...you can see that i posted a comment to original question 2 hrs back. – user2543622 Jul 02 '13 at 20:42
  • @user2543622 Actually you are somewhat confused - what you get from `mod$finalModel` (where `mod` is the object returned by `train()` will be exactly the same as the coefficients from fitting the model via `lm()`. Hence you can use both examples as per my Answer regardless that you fitted using **caret** - it is doing nothing different in fitting the model but instead it gives you better **error** estimates. I'll add another example of doing the line by hand as that may help, but the 2nd example I show above is probably the easiest. – Gavin Simpson Jul 02 '13 at 20:52
  • @user2543622 Yes, I see you did add that as a comment - about 1 hour after I left my Answer. Despite appearances to the contrary, I am not sat here watching your question for an update in the comments! – Gavin Simpson Jul 02 '13 at 20:53
  • will wait for your example of doing the line by hand :) – user2543622 Jul 02 '13 at 20:53
  • @user2543622 I have now added two more examples, including one using **caret**. Both give *exactly* the same output as the second figure in my original Answer, hence why you don;t need to do this by hand, but if you must... – Gavin Simpson Jul 02 '13 at 20:59
  • will frame my questions better the next time...thanks for your help – user2543622 Jul 02 '13 at 21:47
  • i tried your examples and they work...one more question to clear my confusion: why the line returned by mod$finalModel and the line returned by type = c("p","r") are exactly the same? as mod fits model using Cross validation shouldnt line given by it be a bit different? – user2543622 Jul 15 '13 at 21:50
  • @user2543622 because the CV is used only to generate an error estimate in `train` when `method = "lm"`. Read the documentation for **caret** to fully understand what it is doing. As there is nothing to tune in the model during the CV the fitted model is the same. – Gavin Simpson Jul 15 '13 at 22:30
  • i thought that R will fit 10 different models and will use the best one as the final mode :( but when i read online documents it says that for LM there are no tuning parameters :( so what is R doing using 10 samples? i tried to find it online but couldnt find my answer...i referred to document http://caret.r-forge.r-project.org/training.html please let me know if you have any better documentation – user2543622 Jul 16 '13 at 20:24
  • i mean i understand that if it is a neural network then by CV it will determine best values for tuning parameters and then fit model to entire data using those parameters...but in case of regression i am confusing with respect to purpose of CV – user2543622 Jul 16 '13 at 20:28
  • @user2543622 The only purpose is to get a cross-validated measure of the error in the model, namely RMSEP. Remember **caret** is for prediction so, as there is nothing to tune, it fits 10 models (in the case of *k*-fold CV) leaving one fold out each time, then it predicts for the left out fold and computes the error. The CV RMSEP is then averaged across the 10 folds. – Gavin Simpson Jul 16 '13 at 21:16
  • i played more with your code...what does below code does? panel = function(x, y, ...) { panel.xyplot(x, y, ...) panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret } it seems that we are defining a new function called panel and passing arguments x, y to it...my confusion is that generally function is defined and then used/called in later code. but over here it seems that function definition and function call are the same...am i correct? i understand that panel is an argument in the xyplot function – user2543622 Jul 16 '13 at 22:18
  • You are not appreciating the way lattice plots work. It is a panel function that Lattice will use the draw each panel on the plot. That function **will** be called when the plot is drawn. – Gavin Simpson Jul 16 '13 at 23:07
  • i have loved r so far and i find it very useful with respect to drawing graphs...it is just that i want to learn it rather than just memorizing a code..i played with your code more and changed it to xyplot(y ~ x, data = dat, col.line = "red", panel = function(xone, yone, ...) { panel.xyplot(xone, yone, ...) panel.abline(coef = coef(mod$finalModel),...) } ) it works! why...how does R understand that xone=x and yone=y – user2543622 Jul 17 '13 at 18:10
  • Look at the arguments for `panel.xyplot()` and then note that R uses positional matching for arguments as one method to match arguments with values you supply. – Gavin Simpson Jul 17 '13 at 19:18
  • if we updated our model using test function such as below mod2 <- train(y ~ x + x^2, data = dat, method = "lm", trControl = trainControl(method = "cv")), then how would i draw a line? i tried xyplot(y ~ x, data = dat, col.line = "red", panel = function(x, y, ...) { panel.xyplot(x, y, ...) panel.abline(coef = coef(mod2$finalModel), ...) ## using mod from caret } ) i think i didnt get the result i want because in mod2$finalModel has coefficients for x and x2, but while drawing the line we are only using x coefficents – user2543622 Jul 17 '13 at 20:35
  • This Q&A isn't your personal direct line to me you know?! How do you propose to visualise a plane (which is what a regression model with two covariates amounts to) in a 2-d plot? `panel.abline()` is most likely only using the first coefficient. One answer to the problem is to draw a partial plot, but **don't** ask me how to do that here. – Gavin Simpson Jul 17 '13 at 20:59
  • as two x variable are x and x^2, i wanted to plot regression line in a graph of "y vs x", obviously because of quadratic term and only one x axis, line will be actually a curve...i am expecting a plot exactly similar to plot above, except that it will be a curve line.....Asked you questions here as i felt that it will be valuable to other readers too..no other intention – user2543622 Jul 17 '13 at 21:06
  • @user2543622 Ah sorry I missed the `x^2` (it might help to avail yourself of some of the comment formatting tools) - think I parsed it as `x2`. This is where lattice && ggplot graphics get much harder to work with. `panel.abline` is only meant to work for single parameter models. You will need to predict from the model over the range of the data (`x`) and plot that, which is not exactly trivial if you don't understand how lattice works etc. – Gavin Simpson Jul 17 '13 at 21:24
  • i guess the easiest thing to do is follow the technique that you have used to draw the first graph above (one with pink dotted line). Obviously that will end up giving pink dotted curve instead of an actual curve. but at least it will be a good alternative to what i have been looking for. Once again appreciate your help. helped me a lot in learning R and will pick up comment formatting...bye :) – user2543622 Jul 17 '13 at 21:46
3

Another option is to use panel.lmlineq from latticeExtra.

library(latticeExtra)
set.seed(0)
xsim <- rnorm(50, mean = 3)
ysim <- (0 + 2 * xsim) * (1 + rnorm(50, sd = 0.3))

## basic use as a panel function
xyplot(ysim ~ xsim, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.lmlineq(x, y, adj = c(1,0), lty = 1,xol.text='red',
                col.line = "blue", digits = 1,r.squared =TRUE)
})

enter image description here

agstudy
  • 119,832
  • 17
  • 199
  • 261