I am using the drm() package to fit dose-response curves. I get an unexpected linear result from one dataset that has a classic sigmoid shape that should be a great fit. When I remove two outliers, drm() fits well. When I code non-linear regression by hand with the outliers, it fits well.
Why is drm() not working as expected when these two outliers are present? It should still capture the sigmoidal shape.
My data is the following:
xvals <- c(5.9e-10,1.32e-09,2.95e-09,1.47e-08,3.3e-08,7.37e-08,1.65e-07,3.69e-07,8.24e-07,1.84e-06,4.12e-06,9.22e-06,2.06e-05,4.61e-05,NA,5.9e-10,1.32e-09,2.95e-09,6.6e-09,1.47e-08,3.3e-08,7.37e-08,1.65e-07,3.69e-07,8.24e-07,1.84e-06,4.12e-06,9.22e-06,2.06e-05,4.61e-05,5.9e-10,1.32e-09,2.95e-09,6.6e-09,1.47e-08,3.3e-08,7.37e-08,1.65e-07,3.69e-07,8.24e-07,1.84e-06,4.12e-06,9.22e-06,2.06e-05,4.61e-05 )
yvals<-c(0.82966561,4.46055817,3.71339205,2.99750244,2.55537779,6.73492758,51.88282864,176.9312926,-46.02981355,-65.4610744,-65.14554737,-77.16448982,-66.20915802,-65.72309865,NA,10.11865308,1.33800105,1.25214635,1.58224315,3.88199733,7.57852483,0.76265989,5.56251398,18.70036528,-2.41340405,-47.71349527,-48.42671277,-49.38001241,-55.46415588,-59.49426595,4.84234111,5.17505193,3.76047711,0.67847071,17.52378196,11.90207166,2.36100264,9.13674669,18.1938161,-20.79653471,-70.00100539,-67.05360765,-51.15310137,-72.23892517,-50.99236881)
The two "outliers" that seem to be messing things up are the yvals "51.88282864" and "176.9312926".
Using this code, I get this prediction:
df <- data.frame(xvals, yvals)
ldMod <- drm(yvals ~ xvals, data=df, fct = LL.4())
yhat <- predict(ldMod, df)
plot= ggplot(data=df, aes(x=xvals, y=yvals)) +
geom_point(size=2) +
scale_x_log10() +
geom_line(data=df, aes(x = xvals, y = yhat), linetype=2)
print(plot)
This gives this plot: unexpected linear prediction from drm()
However, when I remove the two outliers by manually changing them to "NA", I get this plot expected sigmoidal prediction from drm().
This discrepency did not seem right, so I manually coded to perform a nonlinear regression using the hill model formula and optim() (which I know drm() uses as well). Then, I get the roughly the same results as drm() gave when I removed the outliers. This is the result I would expect:
#define formula
form <- function(theta0, theta1, theta2, theta3, x) {
return(theta0 + (theta1 * ((x^theta2)/(theta3^theta2 + x^theta2))))
}
optim_ic = c(1,1,1,1)
min.rss <- function(data, pars) {
theta0 = pars[1]
theta1 = pars[2]
theta2 = pars[3]
theta3 = pars[4]
resid = y - (form(theta0, theta1, theta2, theta3, x))
return(sum(resid^2))
}
m = optim(par = optim_ic,
fn = min.rss,
data = df,
method = "Nelder-Mead",
hessian = TRUE)
theta0hat <- m$par[1]
theta1hat <- m$par[2]
theta2hat <- m$par[3]
theta3hat <- m$par[4]
ggplot(df) +
geom_point(aes(x=xvals, y = yvals)) +
stat_function(fun = function(x) form(theta0hat, theta1hat, theta2hat, theta3hat, x))
Expected result from non-linear regression by hand
I have tried using different specifications with drm(), such as specificying robust methods like "lts", or telling it to use the "nelder-mead" algorithm. Nothing seems to help. Any ideas? Am I missing something with this package?