1

I am new to programming and using R software, so I would really appreciate your feedback to the current problem that I am trying to solve.

So, I have to fit a cumulative distribution with some function (two/three parameter function). This seems to be pretty straight-forward task, but I've been buzzing around this now for some time.

Let me show you what are my variables:

    x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196)
    y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999)

This is the plot where I set up x-axis as log: Raw data

After some research, I have tried with Sigmoid function, as found on one of the posts (I can't add link since my reputation is not high enough). This is the code:

# sigmoid function definition
sigmoid = function(params, x) {
  params[1] / (1 + exp(-params[2] * (x - params[3])))
}

# fitting code using nonlinear least square
fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))

# get the coefficients using the coef function
params=coef(fitmodel)

# asigning to y2 sigmoid function
y2 <- sigmoid(params,x)

# plotting y2 function
plot(y2,type="l")

# plotting data points
points(y)

This led me to some good fitting results (I don't know how to quantify this). But, when I look at the at the plot of Sigmuid fitting function I don't understand why is the S shape now happening in the range of x-values from 40 until 7 (looking at the S shape should be in x-values from 10 until 200).

Raw data

Since I couldn't explain this behavior, I thought of trying Weibull equation for fitting, but so far I can't make the code running.

To sum up:

  1. Do you have any idea why is the Sigmoid giving me that weird fitting?
  2. Do you know any better two or three parameter equation for this fitting approach?
  3. How could I determine the goodness of fit? Something like r^2?
numb
  • 119
  • 1
  • 1
  • 11
  • 2
    It's plotting the index of the array because you don't supply an x value. Try `plot(x, y2,type="l")` and `points(x,y)`. – Dan May 30 '17 at 14:54
  • @Lyngbakr Thank you. This resolves my first question. I typed `plot(x,y,type="l",log="x")` just to better see this S curvature. But this just confirms that the fit is not looking very good. – numb May 30 '17 at 15:06
  • My initial thought would be to try `a + b * tanh(x/c)`, but that gives crappy results, too... – Dan May 30 '17 at 15:15

2 Answers2

1
# Data
df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196),
           y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999))

# sigmoid function definition
sigmoid = function(x, a, b, c) {
  a * exp(-b * exp(-c * x))
}

# fitting code using nonlinear least square
fitmodel <- nls(y ~ sigmoid(x, a, b, c), start=list(a=1,b=.5,c=-2), data = df)

# plotting y2 function
plot(df$x, predict(fitmodel),type="l", log = "x")

# plotting data points
points(df)

enter image description here

The function I used is the Gompertz function and this blog post explains why R² shouldn't be used with nonlinear fits and offers an alternative.

Dan
  • 11,370
  • 4
  • 43
  • 68
  • Thanks. I wasn't aware of the existence of the Gompertz function. It indeed fits well even for my other data sets. Regarding the R^2, I also read somewhere that it's recommended to use it for non-linear regression and this link that you sent with Standard error of regression seems like the best way to go. – numb May 31 '17 at 07:47
  • By the way, there's also [an R package](https://cran.r-project.org/web/packages/sigmoid/index.html) specifically for sigmoid functions. – Dan May 31 '17 at 11:05
  • Awesome! This package has some models that I couldn't find in [growthmodels](https://cran.r-project.org/web/packages/growthmodels/growthmodels.pdf). You really helped me a lot!! – numb May 31 '17 at 13:04
  • Glad I could help! (And thanks for the heads up about growthmodels – I wasn't aware of that package.) – Dan May 31 '17 at 13:08
  • Well, after initial success I've ended up with a problem for "our" approach due to initial guess values when trying different data sets. This error kept showing _Error in nls(y ~ sigmoid::Gompertz(x, a, b, c), start = list(a = -1, b = 1, singular gradient_ ... So, after some research, I've found this package [drc package](https://cran.r-project.org/web/packages/drc/drc.pdf) which saved my life, up-till now. – numb Jun 01 '17 at 10:48
0

After going through different functions and different data-sets I have found the best solution that gives the answers to all of my questions posted.

The code is as it follows for the data-set stated in question:

df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196),
       y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999))


library(drc)
fm <- drm(y ~ x, data = df, fct = G.3()) #The Gompertz model G.3()
plot(fm)

#Gompertz Coefficients and residual standard error 

summary(fm)

The plot after fitting

numb
  • 119
  • 1
  • 1
  • 11