1

I am trying to calculate percentiles or quantils for data which considerably scatter. Using the Loess function the mean is nicely presented, however, I cannot get percentile/quantils from this function.

I tried to combine quantreg with loess. This plot shows linear curves instead of loess smoothed curves.

I would like to get a result similar to this: enter image description here

data(cars)
plot(cars)
lmodel <- loess(cars$dist~cars$speed,span = 0.3, degree = 1)
lpred<-predict(lmodel, newdata= 5:25,se=TRUE)
lines(5:25, lpred$fit,col='#000066',lwd=4)
lines(5:25, lpred$fit - qt(0.975, lpred$df)*lpred$se, lty=2)
lines(5:25, lpred$fit + qt(0.975, lpred$df)*lpred$se, lty=2)


#### combination of quantreg with loess

plot(cars$speed,cars$dist)
xx <- seq(min(cars$speed),max(cars$speed),1)
f <- coef(rq(loess(cars$dist~cars$speed,span = 0.3, degree = 1), tau=c(0.1,0.25,0.5,0.75,0.9)) )
yy <- cbind(1,xx)%*%f
for(i in 1:length(taus)){
  lines(xx,yy[,i],col = "gray")
}


I also tried the suggested code, however, I could not change the settings of the smoothing. The lines showed wavy path.

library(quantreg)
data(cars)
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9, degree = 1)
rqmodel <- rq(lmodel, tau = taus, data = cars)
f <- coef(rqmodel)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
yy <- predict(rqmodel)
plot(cars)
matlines(xx, yy, col = "grey",lwd=3)

enter image description here

The Loess function does not provide data for quantiles as the rg would.

However, the Loess functions allows to get a curve without zigzag. Please see the code snip. What would be the setting for tau=0.5 using the rg function to produce the same results compared with Loess function.

data(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9 )
plot(cars)
lines( x=4:25 , y=predict(lmodel, newdata= data.frame(speed=4:25)) ,col="Blue")

enter image description here

Niels
  • 141
  • 12

2 Answers2

1

I believe the code in the question is mixing loess and quantile regressions when they are different methods and the latter does not need the former.

I will try to fit both and plot the respective results. In the code below I will use matlines, not a for loop.

These code lines are common.

library(quantreg)

data(cars)

xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))

First the loess model.

lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
ls_yy <- cbind(ls_yy$fit, 
               ls_yy$fit - 2*ls_yy$se.fit, 
               ls_yy$fit + 2*ls_yy$se.fit)

plot(cars)
matlines(xx, ls_yy, col = "darkgrey")

enter image description here

Now quantile regression.

taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
rqmodel <- rq(dist ~ speed, tau = taus, data = cars)

rq_yy <- predict(rqmodel)

plot(cars)
matlines(xx, rq_yy, col = "darkgrey")

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I tried the code, however, it gives very narrow values for each time point. I thought to get something which looks like this: – Niels Jun 01 '19 at 20:30
  • [link] (https://i.stack.imgur.com/7Sgow.png) Please check this link here or the image above. I created this with GAMLSS. GAMLSS requires that the algorithm converges. The examples which I am looking into do not find a solution in GAMLSS. These data do not follow the available distributions of GAMLSS. – Niels Jun 02 '19 at 06:14
  • thanks, @rui-barradas , it looks like that the current quantreg is analyzing the LOESS model. Would it be possible to get the percentiles for the original data estimated? – Niels Jun 02 '19 at 08:16
  • @Niels Why don't you try `yy <- predict(rqmodel)`? It makes much more sense to use the `predict` methods for the models fitted, be it `loess` or `rq`. In this case it's the `rqmodel`, just change that code line. – Rui Barradas Jun 02 '19 at 08:27
  • Hi, thanks, @rui-barradas, I tried the code below. However, I could not smooth the resulting line. I changed the setting of the LOESS equation. I could not get an effect on it. Also, see the figure! ```library(quantreg) data(cars) taus <- c(0.1, 0.25, 0.5, 0.75, 0.9) lmodel <- loess(dist ~ speed, data = cars, span = 0.9, degree = 1) rqmodel <- rq(lmodel, tau = taus, data = cars) f <- coef(rqmodel) xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars)) yy <- predict(rqmodel) plot(cars) matlines(xx, yy, col = "grey",lwd=3)``` – Niels Jun 02 '19 at 15:03
  • @Niels I have completely changed the code, see if this is it. – Rui Barradas Jun 02 '19 at 15:29
  • Thanks. @rui-barradas , I was hoping to achieve more smooth lines for the quantile curves by combing loess and quantreg. The current approaches result in quantile curves which look wavy and have a lot of up and down. Maybe, what I have in mind sounds not mathematical correctly? Would be there a way to smooth the results from the quantreg model? Loess provides span and degree. Is there something similar for quantreg? Or should we use the results from the quantile curves (tau=0.1, 0.25, tau=0.5...) and smooth these curves in an additional step? Please let us know, thanks – Niels Jun 02 '19 at 16:39
  • @Niels Your data seems to be [heteroscedastic](https://en.wikipedia.org/wiki/Heteroscedasticity), the problem will not go way, at least not completely. – Rui Barradas Jun 02 '19 at 17:26
  • Thanks, @rui-barradas , when I use loess with this setting, then there is no zigzag curve. data(cars) lmodel <- loess(dist ~ speed, data = cars, span = 0.9 ) plot(cars) lines( x=4:25 , y=predict(lmodel, newdata= data.frame(speed=4:25)) ,col="Blue") how could I reproduce a similar curve using quantreg for tau=0.5 ? – Niels Jun 03 '19 at 05:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194351/discussion-between-niels-and-rui-barradas). – Niels Jun 03 '19 at 05:12
1

The code below (taken from an "answer") is not correct and should not be included in a correct solution. This would provide a 95% confidence interval on a fit, and the probability that interval lands on the true trend line. It does not correspond to a quantile computed from the data within the span of this moving average. A normal based approximation as recommended would require multiplying ls_yy$se.fit by sqrt(ni) where ni is the number of observations in the particular span. Unfortunately loess does not return ni, so this is not a tenable solution unless the span covers the entire dataset and ni can be set to n and there is no heteroskedasticity.

data(cars)
plot(cars)

lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)

#wrong - this does not denote quantiles for the input data:
ls_yy <- cbind(ls_yy$fit, 
               ls_yy$fit - 2*ls_yy$se.fit, 
               ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")

We can make this more obvious using a sample dataset with more observations. Samples 1 and 2 are identical, aside from their sample sizes (500 and 1500 observations), and therefore they should have very similar quantiles.

set.seed(1)
x1 = runif(500,0,10)
y1 = x1 + rnorm(length(x1))

x2 = runif(1500,0,10)
y2 = x1 + rnorm(length(x2))

dfpd = data.frame(x=1:9)

lmodel1 <- loess(y ~ x, data = data.frame(x=x1,y=y1), span = 0.5, degree = 1)
ls_yy1 <- predict(lmodel1, newdata=dfpd, se = TRUE)

lmodel2 <- loess(y ~ x, data = data.frame(x=x2,y=y2), span = 0.5, degree = 1)
ls_yy2 <- predict(lmodel2, newdata=dfpd, se = TRUE)

#the only difference between lmodel1 and lmodel2 is the number of observations
#the quantiles should be very similar, but their se values are a function of sample
#size and are thus quite different
ls_yy1$se
ls_yy2$se


ls_yy1$se / ls_yy2$se

We can see that the ratio of se values is around 60% which confirms that they cannot be used as-is for quantile calculations

ratio of se values