0

I have a group of 10 subjects (using a subset of 3 of them below as a sample dataset). I have managed to create a for loop that runs through the 3 subjects and fits each one to a sigmoidal curve using nlsLM (package minpack.lm). nlsLM is the same as nls, it just uses a Levenberg-Marquardt algorithm instead of the Gauss-Newton algorithm used with the base nls function.

I have also manage to extract the model parameters (plateau, S50, slope) and put them in a dataframe. My code is as follows:

dfRC <- read_csv("data/sampleRCdf.csv") #read in the raw data files

# list of subjects of interest to loop through
sub <- unique(dfRC$subID)

# define start to obtain the number and name of fit parameters
start <- list(plateau=1, S50=1, slope=1)

# create empty data.frame to store IDs and parameters
params.pre <- data.frame(matrix(nrow = length(sub), ncol = 1+length(start)))
names(params.pre) <- c("sub", names(start))

# nested for loop that goes by subject (i)
for(i in seq_along(sub)) {
  # create data frame for sub "i"
    individual_DFs <- dfRC %>% filter (subID %in% sub[i])
  
  # fit model for each sub "i"
    fitpre.loop <- nlsLM(mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))), 
                         data = individual_DFs,
                         start = start, trace = TRUE)
    
  # store IDs
    params.pre[i,1] <- sub[i]
  # store fit parameters
    params.pre[i,2:ncol(params)] <- fitpre.loop$m$getPars()
}

params.pre

params.pre gives:

sub plateau     S50         slope
101 3.579751    6.505194    0.6930363   
202 2.506159    3.538753    0.8300668   
303 1.971020    5.888228    0.4806047   

This is for the following dataset (where 101, 202, and 303 are my sample subjects), where 101.y.pre is mepAMP_pre and x is state in the for loop above:

x = c(1,2,3,4,5,6,7,8,9,10,11)

101.y.pre <- c(0.38117575, 0.11300747, 0.37239499, 0.51321839, 0.56851893, 
              1.73259296, 2.08146847, 2.80090151, 3.04446933, 2.67647473, 3.87695509)

202.y.pre <- c(0.263931535, 0.554056564, 0.903243066, 1.758670072, 1.512232414,
              2.382228869, 2.744255537, 1.943985522, 2.642561877, 2.880719751, 2.139018852)

303.y.pre <- c(0.197647216, 0.095434883, 0.523944806, 0.625025631, 0.92489588, 
              0.898288637, 0.918388724, 1.433502882, 2.127665395, 1.649622992, 1.642610593)

I had a few questions.

  1. I did a sanity check where I ran each subject individually through nlsLM and made sure the output parameters from that matched the output from my for loop. They do! So, it works. However, I'm a little confused about the output of the individual_DFs dataframe. I thought it would show all 3 subjects, but when I open it up it just shows subject 303. Clearly the code works, but I'm confused why only 1 subject is showing up in the dataframe?

  2. You'll notice I named the param output as params.pre and use the mepAMP_pre values for the model - this is because I have both pre- and post-treatment data for each subject. The .csv file has the column headings "subID", "state", "mepAMP_pre", and "mepAMP_post". Is there a way to use a for loop to loop over both the pre and post values and spit out plateau_pre, plateau_post, S50_pre, S50_post, slope_pre, slope_post and pull them all out into a single dataframe like I currently have with params.pre with the different subjects as rows?

  3. Any advice on how I could graph it? I have managed to graph the pre- fit for one subject using the following code:

df101 <- data.frame(x, fit101.y)

p.sample <- ggplot(data = df101, aes(x = x, y = fit101.y)) +
  geom_point() +
  geom_smooth(method = "nls", 
              data = df101,
              formula = fit101.y ~ plateau / (1 + exp(slope*(S50 - x))), start = list( plateau=1, S50=1, slope=1), 
              se = FALSE)
p.sample

enter image description here

I'd like to maybe overlap the curves for each subject pre- and post-per subject in a grid-like pattern of figures. Or, perhaps graph all the pre- curves onto one figure and all post- curves onto another figure. Would appreciate any help with doing either of these!

---------------EDIT---------------

Thank you Allan for your help! Here is my full data frame (output from dput(dfRC))

structure(list(subID = c(101, 101, 101, 101, 101, 101, 101, 101, 
101, 101, 101, 202, 202, 202, 202, 202, 202, 202, 202, 202, 202, 
202, 303, 303, 303, 303, 303, 303, 303, 303, 303, 303, 303), 
    state = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 
    5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), 
    mepAMP_pre = c(0.38117575, 0.11300747, 0.37239499, 0.51321839, 
    0.56851893, 1.73259296, 2.08146847, 2.80090151, 3.04446933, 
    2.67647473, 3.87695509, 0.263931535, 0.554056564, 0.903243066, 
    1.758670072, 1.512232414, 2.382228869, 2.744255537, 1.943985522, 
    2.642561877, 2.880719751, 2.139018852, 0.197647216, 0.095434883, 
    0.523944806, 0.625025631, 0.92489588, 0.898288637, 0.918388724, 
    1.433502882, 2.127665395, 1.649622992, 1.642610593), mepAMP_post = c(0.126321776, 
    0.566816552, 0.374254417, 0.199486984, 0.510302018, 1.03651474, 
    1.697137046, 2.090100867, 3.448320717, 2.095180146, 2.897606435, 
    0.018846444, 0.041664734, 0.51243325, 0.961881685, 0.998366952, 
    2.082848001, 2.713030559, 3.373811346, 2.839989549, 3.283945894, 
    3.052075374, 0.232427913, 0.895619231, 1.194016429, 1.721528554, 
    2.249776715, 2.756416541, 4.716890788, 4.16244235, 4.757734573, 
    4.965043759, 4.732616496)), row.names = c(NA, -33L), spec = structure(list(
    cols = list(subID = structure(list(), class = c("collector_double", 
    "collector")), state = structure(list(), class = c("collector_double", 
    "collector")), mepAMP_pre = structure(list(), class = c("collector_double", 
    "collector")), mepAMP_post = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x10d8e15d0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

I tried running the code you included and the updated params.pre worked great, however the plot came out weird and I'm not sure where I'm going wrong.

The code:

params.pre <- do.call("rbind", lapply(split(dfRC, dfRC$subID), function(d) {
  mod <- nlsLM(mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))), 
               data = d, start = list(plateau = 1, S50 = 1, slope = 1))
  as.data.frame(c(list(sub = d$subID[1]), as.list(coef(mod))))
}))

ggplot(dfRC, aes(state, mepAMP_pre, color = subID)) +
  geom_point() +
  geom_smooth(method = nlsLM, se = FALSE,
              formula = mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))),
              method.args = list(start = list(plateau = 1, S50 = 1, slope = 1)))

The figure it outputs: enter image description here

A.R.
  • 51
  • 5

1 Answers1

1

You don't need to chop up dfRC to get the plot working. First pivot into long format:

library(minpack.lm)
library(tidyverse)

dfRC_long <- dfRC %>%
  pivot_longer(contains('mep'), names_sep = '_', 
               names_to = c('.value', 'prepost')) %>%
  mutate(subID = factor(subID), prepost = factor(prepost, c('pre', 'post')))

Now your plotting code is just:

ggplot(dfRC_long, aes(state, mepAMP, colour = subID)) +
  geom_point() +
  geom_smooth(method = nlsLM, se = FALSE, 
              formula = y ~ plateau / (1 + exp(slope*(S50 - x))),
              method.args = list(start = list(plateau = 1, S50 = 1, slope = 1))
             ) +
  facet_grid(.~prepost)

enter image description here

Similarly, you can skip an explicit loop to generate your table using some tidyverse functions:

dfRC_long %>%
  group_by(subID, prepost) %>%
  group_map(.f = ~ nlsLM(mepAMP ~ plateau / (1 + exp(slope*(S50 - state))), 
                         data = .x, 
                         start = list(plateau = 1, S50 = 1, slope = 1)) %>%
              coef() %>%
              t() %>%
              as.data.frame() %>%
              mutate(pre_or_post = .y$prepost, .before = 1) %>%
              mutate(subID = .y$subID, .before = 2)) %>%
  bind_rows() %>%
  arrange(pre_or_post, subID)
#>   pre_or_post subID  plateau      S50     slope
#> 1         pre   101 3.579751 6.505194 0.6930363
#> 2         pre   202 2.506159 3.538753 0.8300668
#> 3         pre   303 1.971020 5.888228 0.4806047
#> 4        post   101 2.874621 6.538601 0.9221484
#> 5        post   202 3.225695 5.356826 0.9406343
#> 6        post   303 5.084059 5.094672 0.6321269

Created on 2023-08-16 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks, Allan! The params.pre updated code works great, however I'm having some trouble with the ggplot figure. I've included it in an edit in my post along with the output of dput(dfRC) for a reproducible example of my data. Would appreciate any guidance. – A.R. Aug 17 '23 at 14:32
  • 1
    @A.R. see my update. The problem is that your subID was numeric rather than categorical. The above code fixes that and additionally puts all your parameter estimates for both pre and post into a single table. – Allan Cameron Aug 17 '23 at 14:59
  • Thank you so much! All is running beautifully and smoothly now, and giving me all the plots I need without issue. I really appreciate your help. – A.R. Aug 21 '23 at 15:05
  • Sorry to revive this. I just tried running this on a larger sample of subjects (7 subjects so far) and am now getting the following error for calculation of model parameters (`params`): Warning: lmdif: info = -1. Number of iterations has reached `maxiter' == 50. Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates. Any idea what might be going wrong? – A.R. Aug 23 '23 at 16:42
  • 1
    @A.R. it suggests that for one of your data sets the function cannot find the optimal values for the parameters (at least with the given start values). This is quite common in nls fits. Your options are to firstly check the data is correct (and approximately fits the model), try different start parameters, or try a different model – Allan Cameron Aug 23 '23 at 17:12
  • Thank you! I ended up running an SSlogis function first to just get an idea of starting values to then use in my nlsLM equation. Hopefully the last question I will have - I'm trying to set an upper constraint on my plateau value. I believe nlsLM has `upper` as an option to set a constraint. Do you know, or could point me in the direction of, the syntax for setting an upper bound value for the plateau? From what I've found, a lot of examples show something like `upper = c(1, 3000)` but I'm not sure where those numbers are from and how to set a constraint *only* for one of the parameters. – A.R. Aug 24 '23 at 21:05