-2

I want to do a piecewise regression with two breakpoint in R: first a horizontal line with slope 0, then a linear line and then again a horizontal line with slope 0. The two breakpoints should be fitted also.

My data looks like this (in total I have 60 similar datasets):

x <- c(1.306, 1.566, 1.736, 1.854, 2.082, 2.328, 2.650, 2.886, 3.162, 3.392) 
y <- c(176.4, 188.0, 193.8, 179.4, 134.4, 119.0, 66.2, 58.2, 58.2, 41.2)

Anybody knows how to do it?

Jana
  • 21
  • 2
  • If it might be of some use, I found that a Gaussian peak equation with offset "a * exp(-0.5 * pow((x-b) / c, 2.0)) + Offset", with parameters a = 1.4554605225541448E+02, b = 1.5839534147665826E+00, c = -5.6899030064297440E-01, and Offset = 4.6871995528236809E+01 gave an OK fit yielding R-squared = 0.9865 and RMSE = 6.742 – James Phillips Mar 09 '19 at 11:12

2 Answers2

1

Use nls to fit a line with a minimum and maximum on x like this. a and b are the x values of the intersection points and .lin1 is the intercept of the middle portion and .lin2 is the slope of the middle portion.

fm <- nls(y ~ cbind(1, pmax(pmin(x, b), a)), alg = "plinear", start = list(a = 2, b = 3))

giving:

Nonlinear regression model
  model: y ~ cbind(1, pmax(pmin(x, b), a))
   data: parent.frame()
       a        b    .lin1    .lin2 
   1.774    2.764  425.463 -134.940 
 residual sum-of-squares: 530.7

Number of iterations to convergence: 5 
Achieved convergence tolerance: 6.489e-09

The horizontal portions are at the y values corresponding to the x values of the intersection points:

predict(fm, list(x = coef(fm)[1:2]))
## [1] 186.06667  52.53333

or can be computed as the y values corresponding to the smallest and largest x values:

predict(fm, list(x = range(x)))
## [1] 186.06667  52.53333

We can plot the points and the fit like this:

plot(y ~ x)
xx <- seq(min(x), max(x), length = 100)
p <- predict(fm, list(x = xx))
lines(p ~ xx, col = "red")

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This really is a big help, thank you! The only problem is that I have 60 of these datasets and the range of x differs. So for some the starting values are working and for some not. – Jana Mar 11 '19 at 11:07
  • For example another dataset is: x <- c(2.434, 2.89, 3.08, 3.15, 3.404, 4.494, 3.674, 3.758, 3.9, 4.13) y <- c(129.2,104,96.6,105.4,85.6,71.4,82.4,65.4,64.4,65.6) where Error: singular matrix 'a' in solve appears For some other datasets: Error: step factor 0.000488281 reduced below 'minFactor' of 0.000976562 appears instead. – Jana Mar 11 '19 at 11:15
  • I tried these as starting values: a0 = min(x, na.rm = FALSE) + 0.3; b0 = max(x, na.rm = FALSE) - 0.3 – Jana Mar 11 '19 at 11:16
  • Assuming you have removed NA's, another possibility is `start = list(a = mean(head(x, k)), b = mean(tail(x, k)))` where `k` is chosen appropriately. Your starting values correspond to k=1. If some of your data does not really follow this model then you likely won't get convergence for those sets so it may not be realistic to expect a completely automatic result. – G. Grothendieck Mar 11 '19 at 13:01
1

Updated answer to require less human interaction, Still, basically the same answer as before.

You can coarsely group the points into the upper and lower points automatically. Some of the transition points might get grouped with the upper or lower points, so we use boxplot.stats to eliminate anything that looks like an outlier in these groups. We can then take the mean of the high and low points to estimate the heights for the horizontal lines. We also use the non-outlier upper and lower points to determine the x values for the transitions.

HighLine = (2*max(y) + min(y))/3
HighPoints = which(y >= boxplot.stats(y[y>HighLine])$stats[1])
HighY = mean(y[HighPoints])

LowLine = (max(y) + 2*min(y))/3
LowPoints = which(y <= boxplot.stats(y[y<LowLine])$stats[5])
LowY = mean(y[LowPoints])

x1 = max(x[HighPoints])
x2 = min(x[LowPoints])

plot(x,y)
lines(c(min(x), x1,x2, max(x)), c(HighY, HighY, LowY, LowY))

Piecewise Linear fit

G5W
  • 36,531
  • 10
  • 47
  • 80