1

I have a dataset where I want to apply non linear least squares by group. This is a continuation to my previous question: NLS Function - Number of Iterations Exceeds max

The dataset looks like this:

df
x        y    GRP
0        0      1
426   9.28      1
853   18.5      1
1279  27.8      1
1705  37.0      1
2131  46.2      1
0        0      2
450   7.28      2
800   16.5      2
1300  30.0      2
2000  40.0      2
2200  48.0      2  

If I were to do this with one group it would be like this:

df1<-filter(df, GRP==1)

a.start <- max(df1$y)
b.start <- 1e-06
control1 <- nls.control(maxiter= 10000,tol=1e-02, warnOnly=TRUE)
nl.reg <- nls(y ~ a * (1-exp(-b * x)),data=df1,start= 
list(a=a.start,b=b.start),
           control= control1)
coef(nl.reg)[1]
coef(nl.reg)[2]

> coef(nl.reg)[1]
       a 
5599.075 
> coef(nl.reg)[2]
       b 
3.891744e-06 

I would then do the same thing for GRP2. I want my final output to look like this:

x        y    GRP                       a                       b
0        0      1                5599.075            3.891744e-06
426   9.28      1                5599.075            3.891744e-06
853   18.5      1                5599.075            3.891744e-06
1279  27.8      1                5599.075            3.891744e-06
1705  37.0      1                5599.075            3.891744e-06
2131  46.2      1                5599.075            3.891744e-06
0        0      2    New Value for a GRP2    New Value for b GRP2     
450   7.28      2    New Value for a GRP2    New Value for b GRP2
800   16.5      2    New Value for a GRP2    New Value for b GRP2
1300  30.0      2    New Value for a GRP2    New Value for b GRP2
2000  40.0      2    New Value for a GRP2    New Value for b GRP2
2200  48.0      2    New Value for a GRP2    New Value for b GRP2

Ideally I think dplyr would be the best way but I can't figure out how to do it. This is what I think it will probably look like:

control1 <- nls.control(maxiter= 10000,tol=1e-02, warnOnly=TRUE)
b.start <- 1e-06

df %>%
  group_by(GRP) %>%
  do(nlsfit = nls( form = y ~ a * (1-exp(-b * x)), data=., 
start= list( a=max(.$y), b=b.start),
      control= control1) ) %>%
  list(a = coef(nlsfit)[1], b = coef(nlsfit)[2])

Error:

 in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

Not really sure how to do this though and any help would be great. Thanks!

nak5120
  • 4,089
  • 4
  • 35
  • 94
  • There is an error in your code `df1<-filter(df, GRP=1)` probably should be `df1<-filter(df, GRP==1)` – IRTFM Oct 08 '18 at 22:06
  • Thanks for catching that, I didn't actually run that part. Was just showing it for demonstration purposes. – nak5120 Oct 08 '18 at 22:09
  • @42 does the question make sense? – nak5120 Oct 08 '18 at 22:10
  • It does. Does the answer need to be tidyversical? I find that environment rather difficult to understand much of the time. – IRTFM Oct 08 '18 at 22:13
  • Nope it just needs to by dynamic where it doesn't matter how many groups there are. My actual dataset has 5000 groups making the total dataset close to 1MM rows. Dplyr is usually the fastest with handling that from what I've been researching. – nak5120 Oct 08 '18 at 22:19
  • I think this is a duplicate (despite it having an utterly cryptic title) with a tidyverse answer by @Hack-R: https://stackoverflow.com/questions/45220231/error-results-are-not-data-frames-at-positions – IRTFM Oct 08 '18 at 22:23

1 Answers1

1

I initially got the same error message (re: not finding object 'y' in nls) as I did with a tidyverse stab when initially attempting to use the lapply-split-function paradigm and went searching for: "[r] using nls inside function". I've changed my original use of attach to list2env:

sapply(  split( df , df$GRP), function(d){ dat <- list2env(d)
    nlsfit <- nls( form = y ~ a * (1-exp(-b * x)), data=dat, start= list( a=max(y), b=b.start),
          control= control1) 

list(a = coef(nlsfit)[1], b = coef(nlsfit)[2])} )
#---

  1            2            
a 14.51827     441.5489     
b 2.139378e-06 -6.775562e-06

You also get warnings that you were expecting. These could be suppressed with suppressWarnings( ... )

One of the suggestions was to use attach. Which I then did with extreme reluctance, since I have often warned newbies not to ever use attach. But here it seemed to force a local environment to be constructed. I'm more comforatable with list2env as a mechaism to satisfy nls. The top of the code for nls was what led me to that choice:

if (!is.list(data) && !is.environment(data)) 
    stop("'data' must be a list or an environment")
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Awesome thanks! I'm away from my computer until the morning but will attempt to transpose the results so they look like the final output in the question. – nak5120 Oct 08 '18 at 23:04
  • The structure of the result is a 4 item list with a length-2 dimension. It does respond satisfactorily to the `t` function. – IRTFM Oct 08 '18 at 23:27
  • I tested this out this morning and it is giving me the error of `Error in nls(form = y ~ a * (1 - exp(-b * x)), data = dat, : object 'y' not found` – nak5120 Oct 09 '18 at 14:19
  • When I change it to be `dat$y` and `dat$x` in the model, it throws me a new error stating: `Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates In addition: There were 50 or more warnings (use warnings() to see the first 50)` – nak5120 Oct 09 '18 at 14:23
  • Still working for me. Using `$` inside formulas is generally a bad thing. And you have not included full code and data so are violating the SO rule requiring [MCVE] for questions about errors. The error makes me suspect you are again using perfect data, and you've been warned about that issue. – IRTFM Oct 09 '18 at 17:47
  • Ok thanks, will take a look further into why this is happening. Appreciate the help! – nak5120 Oct 09 '18 at 18:26
  • By the way, I figured out what was happening. There were cases where y equaled 0. When that happened it through the error. So instead, I added disturbance or random noise to make it not be equal to 0. This ended up not throwing the error. Your code also ran extremely fast for a 1MM row dataset with over 5000 groups. Appreciate the help! – nak5120 Oct 09 '18 at 21:48
  • When you transpose it and then convert it to a dataframe like this: `df1<-as.data.frame(t(output))`, the values are still a part of a list. How do I avoid that? – nak5120 Oct 09 '18 at 23:13
  • Question is here: https://stackoverflow.com/questions/52730383/convert-list-values-within-a-dataframe-to-single-level-numbers – nak5120 Oct 09 '18 at 23:13