How can I apply a curve fitting function over different columns' levels in a data frame?

Question

Given the data frame df:

(time <- rep(c(1:9),9))
(test <- rep(c(1:3),3, each = 9))
(specie <- rep(c("Lbcasei", "Lbparacasei", "Lbrhamnosus"), each = 27))

(OD <- rep(c(runif(4,min=0, max=0.4),
                 runif(1,min=0.4, max=0.6),
                 runif(4,min=0.6, max=1)),
               9))
df <- data.frame(specie, test, time, OD)

and given the rquisite function curve_func:

curve_func <- function(t, A, B, C){B / (1 + exp(-C * (t - A))) }

I want apply the following curve fitting function fit , which uses as input OD values over time points time :

fit <- nls(OD ~ growth_func(time, A, B, C), start = list(A = min(time), B = max(OD), C = 0.05))

over all the levels of specie and test columns.

Can anybody help suggesting a function or any other way to iterate/loop it?

I went through the function of the Tidyverse, in particular dplyr,tidyr and purrr, but I did not find specific functions for my problem.

Also, I tried to dig into the theory behind Iteration in R, mainly following the guide RforDataScience, but I did not succeed in that. Thanks in advance, cheers!

G. Grothendieck · Answer 1 · 2023-06-13T16:01:40.897

A few points:

when using random numbers use set.seed first to make the result reproducible. We do so in the Note at the end. Also we make specie a factor.
the structure of the data is such that the data values for each specie value is the same so the parameters for the 3 values of specie will be the same and we can ignore the grouping. We do show without grouping and with.
use nls2 to get initial values
the name of the curve/growth function must correspond to the definition. We have corrected this.

1) Initial values and fit all ungrouped code--

library(nls2)

curve_func <- function(t, A, B, C){B / (1 + exp(-C * (t - A))) }


fit0 <- nls2(OD ~ curve_func(time, A, B, C), 
            df,
            start = data.frame(A = range(df$time), B = range(df$OD), C = 0:1),
            alg = "brute")
co <- as.list(coef(fit0))

fit1 <- nls(OD ~ curve_func(time, A, B, C), 
           df,
           start = list(A = co$A, B = co$B, C = co$C))
fit1
## Nonlinear regression model
##   model: OD ~ curve_func(time, A, B, C)
##    data: df
##      A      B      C 
## 5.3398 1.0203 0.7857 
##  residual sum-of-squares: 1.267
##
## Number of iterations to convergence: 17 
## Achieved convergence tolerance: 7.615e-06

2) This shows how to specify distinct groups although it is not needed with this data due to the fact that the groups, hence the parameters, are the same. We use co from above for initial values.

k <- nlevels(df$specie)
levels(df$specie)
## [1] "Lbcasei"     "Lbparacasei" "Lbrhamnosus"

fit2 <- nls(OD ~ curve_func(time, A[specie], B[specie], C[specie]), 
           df,
           start = list(A = rep(co$A, k), B = rep(co$B, k), C = rep(co$C, k)))
fit2
## Nonlinear regression model
##   model: OD ~ curve_func(time, A[specie], B[specie], C[specie])
##    data: df
##     A1     A2     A3     B1     B2     B3     C1     C2     C3 
## 5.3398 5.3398 5.3398 1.0203 1.0203 1.0203 0.7857 0.7857 0.7857 
##  residual sum-of-squares: 1.267
##
## Number of iterations to convergence: 17 
## Achieved convergence tolerance: 7.608e-06

Since the fit1 and fit2 models are the same, i.e. they give the same predictions, we get a p value of 1 when comparing them using anova in this case.

anova(fit1, fit2)
## Analysis of Variance Table
##
## Model 1: OD ~ curve_func(time, A, B, C)
## Model 2: OD ~ curve_func(time, A[specie], B[specie], C[specie])
##   Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
## 1     78     1.2674                         
## 2     72     1.2674  6      0       0      1

3) nlme Another way to get grouped values is to use nlsList in nlme. This package comes with R so it does not have to be installed, only loaded with a library statement. co is from above.

library(nlme)

fit3 <- nlsList(OD ~ curve_func(time, A, B, C) | specie, 
           df,
           start = list(A = co$A, B = co$B, C = co$C))
fit3
## Call:
##   Model: OD ~ curve_func(time, A, B, C) | specie 
##    Data: df 
##
## Coefficients:
##                    A       B        C
## Lbcasei     5.339834 1.02033 0.785739
## Lbparacasei 5.339834 1.02033 0.785739
## Lbrhamnosus 5.339834 1.02033 0.785739
##
## Degrees of freedom: 81 total; 72 residual
## Residual standard error: 0.1326762

4) Another way is to use Map and the subset= argument of nls. co is from above. The result is a named list of nls objects.

fit4 <- Map(function(x) {
  nls(OD ~ curve_func(time, A, B, C), 
      df,
      start = list(A = co$A, B = co$B, C = co$C),
      subset = specie == x)
}, levels(df$specie))

Note

Same as question except set.seed added and specie is a factor.

set.seed(13)
(time <- rep(c(1:9),9))
(test <- rep(c(1:3),3, each = 9))
(specie <- rep(c("Lbcasei", "Lbparacasei", "Lbrhamnosus"), each = 27))

(OD <- rep(c(runif(4,min=0, max=0.4),
                 runif(1,min=0.4, max=0.6),
                 runif(4,min=0.6, max=1)),
               9))
df <- data.frame(specie = factor(specie), test, time, OD)

Hi, thank you very much for your prompt suggestions! – betterL Jun 16 '23 at 08:15 — betterL, Jun 16 '23 at 08:15

How can I apply a curve fitting function over different columns' levels in a data frame?

1 Answers1

Note