3

I have 5 (x,y) data points and I'm trying to find a best fit solution consisting of two lines which intersect at a point (x0,y0), and which follow these equations:

y1 = (m1)(x1 - x0) + y0
y2 = (m2)(x2 - x0) + y0

Specifically, I require that the intersection must occur between x=2 and x=3. Have a look at the code:

#Initialize x1, y1, x2, y2
x1 <- c(1,2)
y1 <- c(10,10)

x2 <- c(3,4,5)
y2 <- c(20,30,40)

g <- c(TRUE, TRUE, FALSE, FALSE, FALSE)

q <- nls(c(y1, y2) ~ ifelse(g == TRUE, m1 * (x1 - x0) + y0, m2 * (x2 - x0) + y0), start = c(m1 = -1, m2 = 1, y0 = 0, x0 = 2), algorithm = "port", lower = c(m1 = -Inf, m2 = -Inf, y0 = -Inf, x0 = 2), upper = c(m1 = Inf, m2 = Inf, y0 = Inf, x0 = 3))
coef <- coef(q)
m1 <- coef[1]
m2 <- coef[2]
y0 <- coef[3]
x0 <- coef[4]

#Plot the original x1, y1, and x2, y2
plot(x1,y1,xlim=c(1,5),ylim=c(0,50))
points(x2,y2)

#Plot the fits
x1 <- c(1,2,3,4,5)
fit1 <- m1 * (x1 - x0) + y0
lines(x1, fit1, col="red")

x2   <- c(1,2,3,4,5)
fit2 <- m2 * (x2 - x0) + y0
lines(x2, fit2, col="blue")

So, you can see the data points listed there. Then, I run it through my nls, get my parameters m1, m2, x0, y0 (the slopes, and the intersection point).

But, take a look at the solution: enter image description here

Clearly, the red line (which is supposed to only be based on the first 2 points) is not the best line of fit for the first 2 points. This is the same case with the blue line (the 2nd fit), which supposed to be is dependent on the last 3 points). What is wrong here?

Hack-R
  • 22,422
  • 14
  • 75
  • 131
CodeGuy
  • 28,427
  • 76
  • 200
  • 317
  • The code at http://stackoverflow.com/questions/7114703/finding-where-two-linear-fits-intersect-in-r/7115758#7115758 has been slightly improved and has been corrected so use that as the basis of your solution. – G. Grothendieck Aug 19 '11 at 22:32

2 Answers2

3

This is segmented regression:

# input data

x1 <- c(1,2); y1 <- c(10,10); x2 <- c(3,4,5);  y2 <- c(20,30,40) 
x  <- c(x1, x2); y <- c(y1, y2)

# segmented regression

library(segmented)
fm <- segmented.lm(lm(y ~ x), ~ x, NA, seg.control(stop.if.error = FALSE, K = 2))
summary(fm)

# plot

plot(fm)
points(y ~ x)

See ?lm, ?segmented.lm and ?seg.control for more info.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The original specification of this problem at http://stackoverflow.com/questions/7123526/r-script-least-squares-solution-to-the-following specifies that the breakpoint must be constrained. From a brief glance at the `segmented` package it doesn't look like there's a way to do that ... (@CodeGuy could admittedly do a better job in linking to previous questions ...) – Ben Bolker Aug 19 '11 at 23:56
  • 1
    @BB, If the actual problem is anywhere near the example data the constraints won't be active. – G. Grothendieck Aug 20 '11 at 00:13
2

I'm not exactly sure what's wrong but I can get it to work by rearranging things a bit. Please note the comment in ?nls about "Do not use ‘nls’ on artificial "zero-residual" data."; I added a bit of noise.

## Initialize x1, y1, x2, y2
x1 <- c(1,2)
y1 <- c(10,10)

x2 <- c(3,4,5)
y2 <- c(20,30,40)

## make single x, y vector
x <- c(x1,x2)
set.seed(1001)
## (add a bit of noise to avoid zero-residual artificiality)
y <- c(y1,y2)+rnorm(5,sd=0.01)

g <- c(TRUE,TRUE,FALSE,FALSE,FALSE) ## specify identities of points

## particular changes:
##   * you have lower=upper=2 for x0.  Did you want 2<x0<3?
##   * specified data argument explicitly (allows use of predict() etc.)
##   * changed name from 'q' to 'fit1' (avoid R built-in function)
fit1 <- nls(y ~ ifelse(g,m1,m1+delta_m)*(x - x0) + y0,
         start = c(m1 = -1, delta_m = 2, y0 = 0, x0 = 2),
         algorithm = "port",
         lower = c(m1 = -Inf, delta_m = 0, y0 = -Inf, x0 = 2),
         upper = c(m1 = Inf, delta_m = Inf, y0 = Inf, x0 = 3),
         data=data.frame(x,y))

#Plot the original 'data'
plot(x,y,col=rep(c("red","blue"),c(2,3)),
           xlim=c(1,5),ylim=c(0,50))

## add predicted values
xvec <- seq(1,5,length.out=101)
lines(xvec,predict(fit1,newdata=data.frame(x=xvec)))

edit: based ifelse clause on point identity, not x position

edit: changed to require second slope to be > first slope

On a second look, I think the issue above is probably due to the use of separate vectors for x1 and x2 above, rather than a single x vector: I suspect these got replicated by R to match up with the g vector, which would have messed things up pretty badly. For example, this stripped-down example:

g <- c(TRUE, TRUE, FALSE, FALSE, FALSE)
ifelse(g,x1,x2)
## [1] 1 2 5 3 4

shows that x2 gets extended to (3 4 5 3 4) before being used in the ifelse clause. The scariest part is that normally one gets a warning such as this:

> x2 + 1:5
[1] 4 6 8 7 9
Warning message:
In x2 + 1:5 :
  longer object length is not a multiple of shorter object length

but in this case there is no warning ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • wait a sec, but then how do I specify which points should go with which lines in this code? – CodeGuy Aug 19 '11 at 22:52
  • I thought you wanted to specify the breakpoint based on position ... just changed code back to using `g` – Ben Bolker Aug 19 '11 at 22:59
  • also, you see how you had to add the noise to avoid "zero-residual artificiality"...sometimes I actually need my m1 slope to be 0. but I tried to remove that "noise" you added and it gave me an error: Convergence failure: singular convergence (7)...so, what should I do in these cases? – CodeGuy Aug 19 '11 at 23:00
  • yeah, nvm. ignore my question about "specifying points"...but! what do I do about the convergence issue? – CodeGuy Aug 19 '11 at 23:01
  • I would need to know more about what you are actually trying to do here. The algorithm that `nls` uses doesn't work well on zero-noise data. You can make the "noise" awfully small (I just tried a standard dev of 1e-6, it was fine), but if you need it to be exactly zero you need to find another way to do this. – Ben Bolker Aug 19 '11 at 23:03
  • by the way, it's considered good form to link to earlier related questions, and explain in what way the current question is sufficiently different from the previously answered questions to justify asking it as a stand-alone question rather than editing the previous one ... – Ben Bolker Aug 19 '11 at 23:06
  • one more question...how can I require that the slope of the second line be larger than the slope of the first line (the V shape that results from the two lines must be concave) – CodeGuy Aug 20 '11 at 19:09
  • Reparameterize your model in terms of the *difference* between the slopes of the first and second lines and constrain that difference to be positive, i.e. `y ~ ifelse(g,m1,m1+delta_slope)*(x-x0)+y0` with `delta_slope>0`. PS: you should probably edit your question to reflect this -- makes it easier for later viewers to see what's going on rather than having to read a long comment thread. – Ben Bolker Aug 21 '11 at 13:29