2

I want to measure the distance between a set of points and a 1:1 line. I can build a linear model and get the residuals from the best fit, but I cant get the measure from a 1:1 line. Any helpful hints?

#build a df of random numbers     
x=runif(100, 0, 100)
    y=runif (100, 0, 100)
    df=cbind (x,y)
    df=as.data.frame(df)
#build a linear model    
lm1<-lm(y~x, data=df)
    summary (lm1)
#plot the data, lm best fit and 1:1 (red) line)    
    plot (y~x, data=df, pch=16)
    line (lm1)
    abline abline(0,1, col="red")
#get residulas for the linear model 
y.resid= resid (lm1)
I Del Toro
  • 913
  • 4
  • 15
  • 36
  • By 1:1 line, do you mean y = 1 * x + 0 ? – vpipkt Oct 31 '14 at 11:45
  • yes that would be the red line on the plot – I Del Toro Oct 31 '14 at 11:46
  • Keep in mind that for OLS regression the residuals don't give the (minimal) distance (which is orthogonal), but the difference between expected y value and measured y value for a given x value. – Roland Oct 31 '14 at 12:03
  • Could you clarify the question, are you asking for the distance from points to the 1:1 line, or the residuals? –  Oct 31 '14 at 12:30

3 Answers3

4

I suggest using y-x, just like @vpipkt suggested. Just for the sake of completeness: you can also create a linear model with fixed coefficients y-x ~ 0 and take the residual there.

resid(lm(y-x ~ 0))

Of course this is just more complicated and gives the same result as y-x, but it explicitely states that you are taking residuals and not calculating the minimal distance to the line (cf @user3969377's answer).

shadow
  • 21,823
  • 4
  • 63
  • 77
3

To determine the distance between a set of points and a 1:1 line, use

dist[x-y=0; (x0,y0)] = abs(x0 - y0) / sqrt(2)

ref http://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line

For your example,

par(pty="s")
plot (y~x, data=df, pch=16)
line (lm1)
abline(0,1, col="red")
#get residulas for the linear model 
y.resid= resid (lm1)
a=1;b=-1;c=0
xi = (b*(b*x-a*y)-a*c) / (a^2+b^2)
yi = (a*(-b*x+a*y)-b*c) / (a^2+b^2)
segments(x,y,xi,yi,col="blue")
yr = abs(a*x+b*y+c)/sqrt(a^2+b^2)
hist(yr)
2

In the sense of residuals from the model y=x, the distance is simply `y-x'.

r = y-x
plot(r~x)
abline(h=0)

You can expand this to a more general linear model y = ax + b. Residuals are

r = y - ax - b
vpipkt
  • 1,710
  • 14
  • 17