16

Here is some data and a plot:

set.seed(18)
data = data.frame(y=c(rep(0:1,3),rnorm(18,mean=0.5,sd=0.1)),colour=rep(1:2,12),x=rep(1:4,each=6))

ggplot(data,aes(x=x,y=y,colour=factor(colour)))+geom_point()+ geom_smooth(method='lm',formula=y~x,se=F)

enter image description here

As you can see the linear regression is highly influenced by the values where x=1. Can I get linear regressions calculated for x >= 2 but display the values for x=1 (y equals either 0 or 1). The resulting graph would be exactly the same except for the linear regressions. They would not "suffer" from the influence of the values on abscisse = 1

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
Remi.b
  • 17,389
  • 28
  • 87
  • 168

3 Answers3

18

It's as simple as geom_smooth(data=subset(data, x >= 2), ...). It's not important if this plot is just for yourself, but realize that something like this would be misleading to others if you don't include a mention of how the regression was performed. I'd recommend changing transparency of the points excluded.

ggplot(data,aes(x=x,y=y,colour=factor(colour)))+
geom_point(data=subset(data, x >= 2)) + geom_point(data=subset(data, x < 2), alpha=.2) +
geom_smooth(data=subset(data, x >= 2), method='lm',formula=y~x,se=F)

enter image description here

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • 2
    Aahh I love simple solutions ! Thanks a lot. And thanks also for the advice and transparancy trick. – Remi.b Jun 19 '13 at 15:58
  • What should I do if this solution produces this error "Aesthetics must be either length 1 or the same as the data" on my dataset? – half-pass Nov 25 '17 at 21:09
9

The regular lm function has a weights argument which you can use to assign a weight to a particular observation. In this way you can plain with the influence which the observation has on the outcome. I think this is a general way of dealing with the problem in stead of subsetting the data. Of course, assigning weights ad hoc does not bode well for the statistical soundness of the analysis. It is always best to have a rationale behind the weights, e.g. low weight observations have a higher uncertainty.

I think under the hood ggplot2 uses the lm function so you should be able to pass the weights argument. You can add the weights through the aesthetic (aes), assuming that the weight is stored in a vector:

ggplot(data,aes(x=x,y=y,colour=factor(colour))) + 
    geom_point()+ stat_smooth(aes(weight = runif(nrow(data))), method='lm')

you could also put weight in a column in the dataset:

ggplot(data,aes(x=x,y=y,colour=factor(colour))) + 
    geom_point()+ stat_smooth(aes(weight = weight), method='lm')

where the column is called weight.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
1

I tried @Matthew Plourde's solution, but subset did not work for me. It did not change anything when I used the subset compared to the original data. I replaced subset with filter and it worked:

ggplot(data,aes(x=x,y=y,colour=factor(colour)))+
geom_point(data=data[data$x >= 2,]) + geom_point(data=data[data$x < 2,], alpha=.2) +
geom_smooth(data=data[data$x >= 2,], method='lm',formula=y~x,se=F)
1man
  • 5,216
  • 7
  • 42
  • 56