fitting with conditional in grouped data

Question

Sorry for too long question but I'll try to be clear as possible about the problem.

I am trying to do fitting in different groups in data and trying to get fitting coefficients for each group.

I looked around but could not exactly the same problem but found some similar posts like below,

Trying to fit data with R and nls on a function with a condition in it

But it seems that the fitting doesn't seem to care the conditional setting so I am getting the same fitting coeffs for different groups.(This is also the same case for my real data.)

basically, what try to do is use different set of fitting coefficients if gr==a fit that group else fit gr==b.

I am using nlsLM from minpack.lm package since I also need to set the starting values for fitting coefs.

Here is the code which I have tried:

library(minpack.lm)

set.seed(95)

df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
              y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
              x = rep(c(seq(0.1,0.5,0.1)),4))

#creating empty list to fill with fitting coefficients afterwards based on @Hack-R solution Error: Results are not data frames at positions:

empty_dat <- structure(list(x = numeric(0), y = numeric(0), gr = integer(0), sub_gr = character(0), 
              pred_fit = numeric(0), k_a = numeric(0), k_b = numeric(0),
              t_a = numeric(0), t_b= numeric(0)), class = "data.frame")

#do the fitting in groups


for(x in unique(df$gr)){

  #trycatch to   

  fit <- tryCatch(nlsLM(y~ifelse(sub_gr=='a', k_a*x+t_a, k_b*x+t_b),
                        data=df[df$gr==x,],start=c(k_a=0.3,k_b=0.4,t_a=0.1,t_b=0.2),

                        lower = c(0.05, 0.05, 0,0),
                        upper = c(1,1,1,1),                 
                        trace=T,na.action=na.omit, control = nls.lm.control(maxiter=100)),error=function(e) NULL)

  if(!("NULL" %in% class(fit))){
    pred_fit <- predict(fit, newdata =df$x)

    coefs_fit <- data.frame(k_a=coef(fit)[1],k_b=coef(fit)[2],t_a=coef(fit)[3], t_b=coef(fit)[4])



#filling empty_data with coefs and df's original values
        empty_dat <- rbind(empty_dat,data.frame(df[df$gr==x,],coefs_fit,pred_fit,row.names=NULL))
              }   
            }

empty_dat

  gr sub_gr          y   x  k_a  k_b       t_a       t_b  pred_fit
1   1      a 0.28792044 0.1 0.05 0.05 0.1343742 0.2156747 0.1393742
2   1      a 0.24443957 0.2 0.05 0.05 0.1343742 0.2156747 0.1443742
3   1      a 0.07585577 0.3 0.05 0.05 0.1343742 0.2156747 0.1493742
4   1      a 0.03522243 0.4 0.05 0.05 0.1343742 0.2156747 0.1543742
5   1      a 0.02654922 0.5 0.05 0.05 0.1343742 0.2156747 0.1593742
6   1      b 0.48498563 0.1 0.05 0.05 0.1343742 0.2156747 0.2206747
7   1      b 0.18702842 0.2 0.05 0.05 0.1343742 0.2156747 0.2256747
8   1      b 0.15186749 0.3 0.05 0.05 0.1343742 0.2156747 0.2306747
9   1      b 0.15003048 0.4 0.05 0.05 0.1343742 0.2156747 0.2356747
10  1      b 0.07638354 0.5 0.05 0.05 0.1343742 0.2156747 0.2406747
11  2      a 0.28792044 0.1 0.05 0.05 0.1343742 0.2156747 0.1393742
12  2      a 0.24443957 0.2 0.05 0.05 0.1343742 0.2156747 0.1443742
13  2      a 0.07585577 0.3 0.05 0.05 0.1343742 0.2156747 0.1493742
14  2      a 0.03522243 0.4 0.05 0.05 0.1343742 0.2156747 0.1543742
15  2      a 0.02654922 0.5 0.05 0.05 0.1343742 0.2156747 0.1593742
16  2      b 0.48498563 0.1 0.05 0.05 0.1343742 0.2156747 0.2206747
17  2      b 0.18702842 0.2 0.05 0.05 0.1343742 0.2156747 0.2256747
18  2      b 0.15186749 0.3 0.05 0.05 0.1343742 0.2156747 0.2306747
19  2      b 0.15003048 0.4 0.05 0.05 0.1343742 0.2156747 0.2356747
20  2      b 0.07638354 0.5 0.05 0.05 0.1343742 0.2156747 0.2406747

as we can see clearly the coefficients k_a, k_b and t_a , t_b is identical for each gr and sub_gr.

If I want to plot the result and predicted values of fitting

fitting lines telling the different story:))

library(ggplot2)

ggplot(df, aes(x=x, y=y,col=sub_gr,shape=sub_gr)) + 
  geom_point(size=6,alpha=0.8,stroke=1.4)  +
  theme_bw()+
  facet_wrap(~gr,scales='free')+
  scale_color_manual(values=c("blue","red"))+
  geom_line(data=empty_dat,aes(x=x,y=pred_fit,group=sub_gr,col=sub_gr))

score 0 · Answer 1 · answered Nov 17 '17 at 05:11

0

Here's a possible solution. However, based on how you've set up the sample data, I run into the same problem you mentioned, that the models are the same for each group. That said, the training data is exactly the same for both groups, so I'm not surprised to get identical coefficients.

library(tidyverse)
library(broom)

# a function to build the model 
makefit <- function(df) {
    tryCatch(nlsLM(y~ifelse(sub_gr=='a', k_a*x+t_a, k_b/x+t_b),
                        data=df,start=c(k_a=0.3,k_b=0.4,t_a=0.1,t_b=0.2),

                        lower = c(0.05, 0.05, 0,0),
                        upper = c(1,1,1,1),                 
                        trace=T,na.action=na.omit, control = nls.lm.control(maxiter=100)),error=function(e) NULL)
}

# a function to get the coefficients out of the model
myaugment <- function(fit) {
    data.frame(k_a=coef(fit)[1],k_b=coef(fit)[2],t_a=coef(fit)[3], t_b=coef(fit)[4])
}


dfprep <- df %>% 
  group_by(gr) %>% 
  # nest the other variables as a list col
  nest() %>% 
  mutate(
       # build a model for each group
       model = map(data, makefit)
       # get the coeffients
     , modelaugment = map(model, myaugment)
  )


 # extract the results
results <- dfprep %>% 
  # extract the original data
  unnest(data) %>% 
  # join in the coefficent data
  left_join(dfprep %>% unnest(modelaugment), by = 'gr')

Also, I wasn't sure where you were getting your new testing data set from since it wasn't included in your example, so I didn't include a way to get that. However, building that into the makeaugment() function should be pretty straightforward.

answered Nov 17 '17 at 05:11

crazybilly

2,992
1
16
42

Thank you very much for your answer.OTH, `results` still giving the same identical coefficients for different `sub_gr`. But your code is more shorter and clean. Also, could not understand your statement about `new testing data` ? There is only one data set and it is `df` afterwards I create `empty_dat` to fill with fitting outputs. – Alexander Nov 17 '17 at 05:47
If your original example, you predict new data from `new.range`, which I assumed was another object like df because it wasn't referenced elsewhere. With regard to the models: I'm not clear what the formulas in your model mean (not being intimately familiar with non-linear models, I don't quite see where `k_a` comes from), but I wonder if you need to use `I()` in the formula--since you're dividing the second half, are you intending to multiplying or to specify interaction? See paragraphs 2 - 3 of the help for `formula()`. – crazybilly Nov 17 '17 at 14:54
Oh I see. Sorry about that. I forgot to modify that part when did the copy past. I modified the OP. I also edited the equation part. The equation is also should be the same. – Alexander Nov 17 '17 at 22:58
I added that pred_fit part to show that since we cannot fit correctly in groups the fitting line distorted very much from the real data. – Alexander Nov 17 '17 at 23:01
To be clear, when you write your variables in the formula, do you mean k_a multiplied by x plus t_a or do you mean the interaction of k_a and x as well as t_a? – crazybilly Nov 19 '17 at 03:55
I meant k_a multiplied by x plus t_a that is the equation. – Alexander Nov 19 '17 at 05:17

fitting with conditional in grouped data

1 Answers1