nls model validation issues for data fitting in r

Question

Following are the sample raw data points (actual data) as shown in the points plot. Trying to validate if the given model can produce a reasonable fit for this data set.

R is complaining about the data, the model with and without initial values, with and without maxiter set; another error too.

Latest error: step factor 0.000488281 reduced below 'minFactor' of 0.000976562

Excel solver did a decent preliminary job to derive initial values, and a fit than R without a single complaint. However Excel is not sufficient to complete the task and I would like to get some help here to get a better fit and to resolve this error. Open to both r and python codes (if python libraries can handle this better than r nls...).

model equation :

y.nls <- nls(y ~ a2 + ((a1 - a2)*(((Lc/K)^n*H)/((1 + (Lc/K) )^n*H))) , 
             start = c(n = n.init, H = H.init, K = K.init, a2 = a2.init),
             control = list(maxiter = 100), trace = TRUE)

#initial values for variables:
n.init <- 0.6
H.init  <- 2.6
K.init <- 0.4
a1 <- 0.01
a2.init <- 0.75

sample data:

Lc <- c( 0.001 ,0.002 ,0.003 ,0.004 ,0.005 ,0.006 ,0.007 ,0.008 ,0.01 ,0.05 , 0.08 ,0.1 ,0.049554 ,0.099109 ,0.1486635 ,0.1783962 ,0.198218 ,0.24479923 ,0.29534482 ,0.3468815 ,0.396436 ,0.43409742 ,0.495545 ,0.5450995 ,0.594654 ,0.6442085 ,0.693763 ,0.7433175 ,0.792871 ,0.941534313 ,0.99108875 ,1.486634 ,1.982178667 ,2.477723333 ,2.477723333 ,2.973267327 ,4.45990099 ,5.946534653 ,7.928712871 ,9.415346535 ,12.88415842 ,16.3529703 ,19.32623762 ,19.32623762 ,22.7950495 ,22.7950495 ,27.33423762 ,27.33423762 ,27.33423762 ,27.33423762 ,29.23712871 ,29.23712871 ,32.70594059 ,35.67920792 ,35.67920792 ,37.66138614 ,37.66138614 ,39.1480198 ,39.1480198 ,39.1480198 ,59.46534653 ,59.46534653 ,79.26750804 ,79.26750804)

y <- c(0.7336301 ,0.7300885 ,0.7302002 ,0.7265662 ,0.7217741 ,0.7223443 ,0.7238027 ,0.71864 ,0.7094976 ,0.6989751 ,0.6764343 ,0.6598028 ,0.410136 ,0.3917024 ,0.3777954 ,0.3685038 ,0.3590556 ,0.3537131 ,0.3488757 ,0.343905 ,0.3402811 ,0.3378536 ,0.3371936 ,0.3337051 ,0.3308202 ,0.3284294 ,0.328673 ,0.3269062 ,0.3233758 ,0.3233758 ,0.3210403 ,0.3189582 ,0.3163805 ,0.3135109 ,0.3126578 ,0.3114899 ,0.3090119 ,0.3056575 ,0.3040519 ,0.301711 ,0.3005168 ,0.2975471 ,0.2903744 ,0.2960987 ,0.2874757 ,0.2914471 ,0.2900818 ,0.2900818 ,0.2841886 ,0.2841886 ,0.2807069 ,0.2861011 ,0.2829085 ,0.2770104 ,0.2824439 ,0.2748469 ,0.2797121 ,0.2669634 ,0.2723519 ,0.2768586 ,0.2676962 ,0.2730733 ,0.2656272 ,0.2706228)

I tried your this [solution too](https://stackoverflow.com/a/42513058/1389394) But not getting anywhere close to a solution... — bonCodigo, Sep 24 '22 at 07:40
@G.Grothendieck that's a great start. Well you have changed the model in your mssg. The model is given as, a2 + ((a1-a2) * ((Lc/K)^nH)/( 1 + (Lc/K))^nH)) Is there any reason you changed it or is it a typo perhaps? Model was said to be a derivative of Hills binding equation. Then a2 = 1 is a little too far from the actual value which is expected to be close to ~0.73 It's important to know the n, H, K values from the fit. I don't mind keeping a1 fixed as it is expected to be close to zero as Lc goes to infinity... I am open to _fixing_ of the model to explain the data correctly — bonCodigo, Sep 24 '22 at 08:10
I have moved my comments to an answer and expanded on them there. — G. Grothendieck, Sep 24 '22 at 11:11

G. Grothendieck · Answer 1 · 2022-09-25T12:54:12.130

There are several problems:

there is an error in the parentheses in the RHS shown in the question. If the equation in the question were correct then H would cancel in the numerator and the denominator. The equation in the Wikipedia article on the Hill equation is also different from what is shown in the question.
even if we fix the parentheses there is still the problem that the parameters are not identifiable because multiplying K by any number x and multiplying H by x^n gives the same value to the RHS. Therefore there is no unique value of the parameters at the optimum even in principle -- there are an infinite number of combinations. It is not a matter of software. We must fix either H or K.

Fixing these problems we have the following. We fix H=1 and we have the used the plinear algorithm in nls. It requires a matrix RHS such that each column is implicitly multiplied by a parameter and then the columns are summed together as a vector sum to get the RHS. The implicit parameters, which are linear as opposed to the explicit parameters are nonlinear, receive the names of their column prefixed with .lin.. The implicit/linear parameters do not require starting values simplifying the setup.

Note that a2 is indeed 0.73 consistent with your comment under the question and it converges in only 12 iterations from the starting values we used.

H <- 1
y.nls <- nls(y ~ cbind(a2 = 1, a1minusa2 = ((Lc/K)^n*H)/( 1 + (Lc/K)^n*H)),               
  start = c(n = 1, K = 0.1), algorithm = "plinear")
y.nls
## Nonlinear regression model
##   model: y ~ cbind(a2 = 1, a1minusa2 = ((Lc/K)^n * H)/(1 + (Lc/K)^n *     H))
##    data: parent.frame()
##              n              K        .lin.a2 .lin.a1minusa2 
##        1.42432        0.09251        0.73086       -0.44070 
##  residual sum-of-squares: 0.1215
##
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 3.591e-06

plot(y ~ Lc)
lines(fitted(y.nls) ~ Lc, col = "red")

If you don't want to reparameterize from (a1, a2) to (a1minusa2, a2) then at the expense of a more complicated RHS we could collect each of the a1 and a2 terms:

H <- 1
y.nls2 <- nls(y ~ cbind(
    a1 = (Lc/K)^n*H/( 1 + (Lc/K)^n*H),
    a2 = 1 - (Lc/K)^n*H/( 1 + (Lc/K)^n*H)
  ), 
  start = c(n = 1, K = 0.1), algorithm = "plinear")
y.nls2
## Nonlinear regression model
##   model: y ~ cbind(a1 = (Lc/K)^n * H/(1 + (Lc/K)^n * H),
##    a2 = 1 - (Lc/K)^n *     H/(1 + (Lc/K)^n * H))
##    data: parent.frame()
##       n       K .lin.a1 .lin.a2 
## 1.42432 0.09251 0.29016 0.73086 
##  residual sum-of-squares: 0.1215
##
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 3.732e-06

Appreciate this effort. One fact I can assure you, that the model can't be identical to Hill in wiki; it's a derivative and there are variations to it, depending on the conditions in binding assays; so we have to omit that being a problem here. However I do agree this model is not near optimum. I do not understand this part in your answer: `a1minusa2 = (` a1-a2 is not equal to the rest of the equation. a1-a2 is part of the equation. — bonCodigo, Sep 24 '22 at 11:49
We can coalesce a1-a2 into a single parameter a1minusa2 so that it is linear and therefore with plinear we can omit its starting value. Rearranging we can get a1 = a1minusa2 + a2. I really do think that the Wikipedia article shows that the equation in the question cannot be correct. It's not a matter of being absolutely identical but rather the form in the question is wrong. — G. Grothendieck, Sep 24 '22 at 11:55

nls model validation issues for data fitting in r

1 Answers1