0

I am trying to fit a non-linear model using nls in R, however, I can't seem to get the model specification correct. Any advice on how to correct the model, would be much appreciated. A minimal example below.

# Import dummy data 
data <- read.csv("https://raw.githubusercontent.com/guysutton/nls_singular_repo/main/data2.csv")
head(data)

I think the specification of the fixed effects is what is causing the issue. I am not experienced with fitting 'nls' and picking starting values, as several other StackOverflow posts have indicated that the error I am getting could be because of poor starting values. I have tried starting pH(1:3, in increments of 0.1).

# Fit NLS model 
mod <- stats::nls(
  # Response variable
  y ~
    # Predictors 
     ( (x1 + x2) / (-0.6258*pH^4+11.847*pH^3-79.884*pH^2+230.97*(pH)-242.64) ),
  start = list(pH = 1),
  control = nls.control(maxiter = 1000),
  data = data
)

Error in nls(y ~ ((x1 + x2)/(-0.6258 * pH^4 + 11.847 * pH^3 - 79.884 * : singular gradient

However, when I take the fixed effect expression and manually input values for the first row of data, I get the correct value.

# This produces the correct value for the first row of data when I sub in the values 
(x1 + x2) / (-0.6258*pH^4+11.847*pH^3-79.884*pH^2+230.97*(pH)-242.64)
(1.50 + 0.45) / (-0.6258*4.41^4+11.847*4.41^3-79.884*4.41^2+230.97*(4.41)-242.64)

[1] 1.132759

Any advice on what might be causing my issue, and how to potentially solve this issue, would be much appreciated.

Guy Sutton
  • 45
  • 3
  • 1
    pH is only defined in [0,14]. Your denominator can become zero in this interval: `curve(-0.6258*x^4+11.847*x^3-79.884*x^2+230.97*(x)-242.64, from = 0, to = 14); abline(0,0)` This model sucks (unless you constrain pH to a specific interval where values of the denominator are never zero). – Roland Sep 01 '22 at 09:14
  • 1
    The point of `nls` is to find the coefficients in your model, but you don't have any coefficients to find. You are presupposing them; the numbers -0.6258, 11.847, -79.884, 230.97, and -242.64 are typically what you would use `nls` to find. If you already have these numbers, then you already have your model. – Allan Cameron Sep 01 '22 at 09:21

1 Answers1

1

The problem (as pointed out in the comments) is that you are trying to determine the coefficients for your model with nls. The way you currently have the model set up, you are trying to predict a single pH value that describes all the variability in y given x1 and x2. I think what you really want to to determine the coefficients for your input pH values that give the best model. Maybe something like:

library(tidyverse)

mod <- stats::nls(
  # Response variable
  y ~
    # Predictors 
     ( (x1 + x2) /(k*pH + k2*pH^2 + k3*pH^3 + k4*pH^4) ),
  start = list(k = 10, k2 = 1, k3 = 0.3, k4 = 0.03),
  control = nls.control(maxiter = 1000, tol = 1e-3, minFactor = 1e-5),
  data = data
)

summary(mod)
#> 
#> Formula: y ~ ((x1 + x2)/(k * pH + k2 * pH^2 + k3 * pH^3 + k4 * pH^4))
#> 
#> Parameters:
#>      Estimate Std. Error t value Pr(>|t|)    
#> k  -113.65175    4.31320  -26.35   <2e-16 ***
#> k2   82.57272    3.20163   25.79   <2e-16 ***
#> k3  -19.84041    0.78430  -25.30   <2e-16 ***
#> k4    1.58837    0.06349   25.02   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.6587 on 5076 degrees of freedom
#> 
#> Number of iterations to convergence: 13 
#> Achieved convergence tolerance: 0.0007377

data|>
  mutate(mod = predict(mod)) |>
  ggplot(aes(y, mod))+
  geom_point()+
  scale_x_log10()+
  scale_y_log10()

AndS.
  • 7,748
  • 2
  • 12
  • 17
  • Also, it looks like you are doing some kind of expansion with the pH in the bottom, maybe because pH is log transformed [H+]. I'm not totally sure what you are trying to model, but there is probably a better function for the denominator. – AndS. Sep 01 '22 at 10:55