R: Using "group_by" with "nls()"

Question

I have a dataset that I want to fit a Gompertz model grouped by 4 different factors (subject, race, target & distractor). The Gompertz model works when applied to the entire data set (i.e., without applying "group_by"). The group_by function works when I use a (much simpler) linear regression. However, when I try to use group_by with the Gompertz model I get the following error:

Error in chol2inv(object$m$Rmat()) : 
  element (3, 3) is zero, so the inverse cannot be computed
In addition: Warning messages:
1: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda -  :
  Convergence failure: false convergence (8)
2: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda -  :
  Convergence failure: singular convergence (7)

Here is the code:

grouped_data = all_merged %>%
  group_by(subject,race,target,distractor)

gomp_fits = do(grouped_data, tidy(nls(yt ~ ymin+ymax*(exp(-exp((alpha* 2.718282/ymax)*(lambda-time)+1))), data = ., start = list(lambda = 0.480, alpha = 5.8, ymin = 0, ymax = 1.6),
                                 control = list(warnOnly = TRUE),
                                 algorithm = "port",
                                 lower = c(0,-Inf, -Inf, 0),
                                 upper= c(2, Inf, Inf, 2))))

Thank you!

You are grouping by 4 different variables and trying to fit a 4 parameter model. Are you sure you have enough samples (> 5) in every group to perform a fit? — Dave2e, Jun 11 '21 at 21:23
If you do have sufficient data then try using the overall values obtained from the ungrouped nls as the starting values for each group. — G. Grothendieck, Aug 06 '21 at 13:52

gradcylinder · Answer 1 · 2021-08-04T17:19:04.150

TLDR

Consider nlsLM, a self-starting Gompertz model or use a method to calculate starting values, use it in a group_modify workflow.

Maybe something like this (though upper and lower limits may not be necessary

fit_gomp <- function(data, ...) {
    nlsLM(formula = y ~ SSgompertz(x, Asym, b2, b3),
          data = data,
          lower = c(0,-Inf, -Inf, 0),
          upper = c(2, Inf, Inf, 2),
          ...) %>% tidy()
}

data %>%
  group_by(subject, race, target, distractor) %>%
  group_modify(~ fit_qomp(data = .x), .keep = TRUE)

Getting starting values

While I haven't used a Gompertz model, consider if you can find a way to get starting values mathematically.

For example, let's say I want to fit a quadratic-plateau model (it only has 3 starting parameters however). First I have a function that defines the equation, which will go inside nls later.

# y = b0 + b1x + b2x^2
# b0 = intercept
# b1 = slope
# b2 = quadratic term
# jp = join point = critical concentration

quadp <- function(x, b0, b1, jp) {
    b2 <- -0.5 * b1 / jp
    if_else(
        condition = x < jp,
        true  = b0 + (b1 * x) + (b2 * x * x),
        false = b0 + (b1 * jp) + (b2 * jp * jp)
    )
}

The second part is to make a fitting function that fits a quadratic polynomial, uses those coefficients as starting values in the nls portion, and fits the nls model.

fit_quadp <- function(data, ...) {
    # get starting values from simple quadratic
    start <- lm(y ~ poly(x, 2, raw = TRUE), data = data)
    start_values <- list(b0 = start$coef[[1]], # intercept
                         b1 = start$coef[[2]], # slope
                         jp = median(data$x)) # join-point
    
    # nls model that uses those starting values
    nlsLM(formula = y ~ quadp(x, b0, b1, jp),
        data = data,
        start = start_values,
        ...
    ) %>% tidy()
}

The ... is to add arguments for nls.control if needed.

Analyzing grouped data

As for analyzing grouped data, I use group_modify() because it returns a data frame whereas group_map() returns a list. So my basic workflow looks like:

dataset %>%
    group_by(grouping_variable_1, grouping_variable_2, ...) %>%
    group_modify(~ fit_quadp(data = .x), .keep = TRUE)

Then out comes a table with all the tidy statistics because tidy() was used in the function. You can consider including a try() wrapped around the nls() portion of the function so that if it succeeds on the first two groups but on the third, it'll still continue and you should still get some results.

nlsLM()

Also, if you want to use nlsLM from minpack.lm, the algorithm there succeeds more than those available in nls(). Some worry about false convergence, but I haven't seen it yet in my applications. Also with nlsLM you may not need to bother with upper and lower limits, though they can still be set.

R: Using "group_by" with "nls()"

1 Answers1

TLDR

Getting starting values

Analyzing grouped data

nlsLM()