R fitting an S-shaped or cumulative function to estimate values outside data range

Question

I have some data points on % penetration of a product for a specific time window and i would like to estimate the "before and after" that time window using an S-shaped function that starts around %penetration=zero and ends around %penetration=1, so effectively like a cumulative distribution function.

Here's the data:

period<-seq(13,73,1)
value<-c(0.267, 0.292, 0.317, 0.342, 0.367, 0.394, 0.422, 0.450, 0.477, 0.492, 0.507, 0.522, 0.537, 0.562, 0.587, 0.612, 0.637, 0.645, 0.652, 0.660, 0.667, 0.675, 0.682, 0.690, 0.697, 0.704, 0.711, 0.718, 0.724, 0.729, 0.734, 0.739, 0.743, 0.748, 0.753, 0.757, 0.762, 0.766, 0.770, 0.774, 0.778, 0.782, 0.786, 0.790, 0.794, 0.797, 0.800, 0.803, 0.806, 0.809, 0.812, 0.814, 0.817, 0.820, 0.822, 0.824, 0.827, 0.829, 0.831, 0.834, 0.836)

So my time window goes from period=13 to period=73 and I would like to estimate values for periods 1 to 12 (starting close to value=zero) and periods 74 to 140 (ending close to value=1). Any idea how i can do this...? thanks in advance, Chris

G. Grothendieck · Answer 1 · 2014-01-14T17:22:44.037

Below we show 4 variations and plot them so we can visually assess the fit.

1) LL.4 We could try the LL.4 model of the drc package:

> library(drc)
> value.m1 <- drm(value ~ period, fct = LL.4())
>
> before <- predict(value.m1, data.frame(period = 1:12)); before
 [1] -0.021014377 -0.014620515 -0.003811402  0.011217655  0.030115032
 [6]  0.052416255  0.077583272  0.105041527  0.134213115  0.164544170
[11]  0.195525422  0.226705738
> after <- predict(value.m1, data.frame(period = 74:140)); after
 [1] 0.8295972 0.8309413 0.8322356 0.8334824 0.8346839 0.8358422 0.8369594
 [8] 0.8380374 0.8390778 0.8400824 0.8410528 0.8419904 0.8428968 0.8437731
[15] 0.8446209 0.8454411 0.8462350 0.8470037 0.8477483 0.8484696 0.8491687
[22] 0.8498463 0.8505035 0.8511409 0.8517594 0.8523597 0.8529424 0.8535083
[29] 0.8540579 0.8545920 0.8551109 0.8556154 0.8561059 0.8565830 0.8570470
[36] 0.8574986 0.8579380 0.8583658 0.8587823 0.8591880 0.8595831 0.8599681
[43] 0.8603433 0.8607090 0.8610656 0.8614132 0.8617523 0.8620831 0.8624058
[50] 0.8627207 0.8630281 0.8633282 0.8636213 0.8639074 0.8641869 0.8644600
[57] 0.8647268 0.8649875 0.8652424 0.8654915 0.8657351 0.8659733 0.8662063
[64] 0.8664342 0.8666572 0.8668754 0.8670889
> plot(value.m1, xlim = c(1, 140), ylim = 0:1, pch = 20)

The extrapolation going forward looks like it might be systematically too low and the before values go below zero but perhaps that is sufficient for your needs.

(continued after image)

screenshot LL.4

2) MM.3 Visually, the MM.3 model produces a better fit at the upper end but a bit worse at the lower end:

value.m2 <- drm(value ~ period, fct = MM.3())
plot(value.m2, xlim = c(1, 140), ylim = 0:1, pch = 20)

(continued after image)

screenshot MM.3

3) Combined One could pick up the best of both using the following (continuing on from the last plot):

value.pred1 <- predict(value.m1, data.frame(period = 1:140))
value.pred2 <- predict(value.m2, data.frame(period = 1:140))
value.pred12 <- c( head(value.pred1, 19), tail(value.pred2, -19) )
lines(1:140, value.pred12, col = "red")

giving:

screenshot combined

4) Cumulative Scale If we fit on the cumulative scale LL.4 gives a visually pleasing plot:

DF <- data.frame(value.cum = cumsum(value), period)
fm <- drm(value.cum ~ period, data = DF, fct = LL.4())
plot(fm, xlim = c(1, 140), ylim = c(0, 100), pch = 20)

cumulative fit screenshot

ADDED MM.3 model, combined model and cumulative model.

But in the long run is this "better" than my cheapo approach? I fear beauty is in the eye of the OP :-) — Carl Witthoft, Jan 14 '14 at 15:27
that's absolutely fantastic, thank you so much! i'd never come across the drc package before. Brilliant. — user3193734, Jan 14 '14 at 18:34

score 0 · Answer 2 · answered Jan 14 '14 at 12:22

I plotted your dataset, and frankly do not recommend an "S" curve. I would do a linear fit over period < 30 and a separate power fit, perhaps y = k*sqrt(x) to period > 30 .

Be warned that extrapolation outside a dataset is always a risky move unless you have great faith that the data are based on the function (e.g. linear) you used to fit to the dataset.

R fitting an S-shaped or cumulative function to estimate values outside data range

2 Answers2