2

I'm trying to use the quantreg package to fit an exponential curve.

Here is a reproductible example. IRL I have much more complex data with outliers, that's why I prefer not using nls which is not robust to outliers.

library(quantreg)
library(ggplot2)

x = 1:100
set.seed(42)
y = 500*exp(-0.02*x) +rnorm(100, 0, 5 )
df = data.frame(cbind(x,y))
plot(df)

formula =  y ~ k * exp(b*x) 
qr_exp = nlrq(formula,
                   data = df,
                   start = list(k = 600, b = -0.01),
                   tau = .50,
                   nlrq.control(maxiter=1000))
summary(qr_exp)
sum(qr_exp$m$resid())
[1] -26.52373

I expected to have sum(qr_exp$m$resid()) around 0 since tau = 0.5but the value is negative which means the model tend to overestimate the real values.

As you can see I have sum of the residual is closer to 0 with tau= 0.47

formula =  y ~ k * exp(b*x) 
qr_exp = nlrq(formula,
              data = df,
              start = list(k = 600, b = -0.01),
              tau = .47,
              nlrq.control(maxiter=1000))
summary(qr_exp)
sum(qr_exp$m$resid())
[1] -4.467781

I don't really understand why.

Is it because there could be an infinite number of solution and so no guarantee of having as much negative residual than positive residual?

If yes what is the best solution if this is very important for me to:

  • minimise Least absolute deviation and not least square deviation (not robust with outliers)
  • have balanced residual?

Could it make sense to add a small portion of L2 penalty to have something balanced ? (see Huber loss)

lmo
  • 37,904
  • 9
  • 56
  • 69
  • 1
    you are using quantile regression instead of least square, are you aware of this? (`nls` or `optim` + objective function to build should be enough for your problem) – Colonel Beauvel Oct 27 '15 at 16:19
  • As alluded to above, I'm not sure why you expect the residuals to be symmetric around an estimate of the median. There should be the same _number_ of observations above and below the median, but their magnitudes may be quite different. – joran Oct 27 '15 at 16:28
  • @joran tha'ts not true. Quantile regression = Least Absolute Deviation when tau=0.5 and we don't want half of the record above but we want to mimimise the sum of the absolute deviation. –  Oct 27 '15 at 16:36
  • @ColonelBeauvel This is just a basic reproductive example. This is really important for me to not being impacted by outlier and to minimise the LAD. That's why `nls` doesn't fit here. –  Oct 27 '15 at 16:38
  • 2
    this question seems more suitable for [CrossValidated](http://stats.stackexchange.com). Also, it would be good to use `set.seed()` so your example is reproducible ... – Ben Bolker Oct 27 '15 at 16:45
  • I'm certainly no quantile regression expert, I just mean to say that based on what I _do_ know I don't share your intuition/expectation that the residuals ought to be symmetrically distributed around the conditional median in this case. – joran Oct 27 '15 at 17:03

0 Answers0