loop to run through multiple datasets and save each output in R

Question

We are trying to run a loop through eight different datasets and then save the output in a standardized format afterwards.

Dataframes are called df1, df2, df3, etc.

I can't share the data (here is a sample of the data), but every dataset is a subset of df1 - so it does have the same columns throughout.

df1 would look like:

age   wt   sex
10    200  F
15    250  F
20    300  F
12    200  M
13    250  M
25    300  M

And the subsetted dfs would be, for example, df2<-df1%>%filter(sex=="F"), df3<-df1%>%filter(sex=="M"), and so on for different conditions.

Here is a small example of the code we want to run for each dataframe.

nls.mon <- nls(wt~A*(1-exp(k*(t0-age))), 
    data=df1,
    start = list(A=253.6,k=.03348,t0=32.02158))

aad_mon_est <- data.frame(tidy(nls.mon)) 
mon_A_est  <- as.numeric(aad_mon_est[1, "estimate"])
mon_k_est  <- as.numeric(aad_mon_est[2, "estimate"])
mon_t0_est <- as.numeric(aad_mon_est[3, "estimate"])

nls.von <- nls(wt ~A*(1-(1/3)*exp(k*(t0-age)))^3, 
    data=df1,
    start=list(A=253.6,k=.03348,t0=32.02158))

aad_von_est <- data.frame(tidy(nls.von))
von_A_est  <- as.numeric(aad_von_est[1, "estimate"])
von_k_est  <- as.numeric(aad_von_est[2, "estimate"])
von_t0_est <- as.numeric(aad_von_est[3, "estimate"])

If there a way to tell the loop to run through each dataframe (df1, df2, df3, etc.) and then save aad_arc_B_est, aad_arc_k_est, and aad_arc_mx_est afterwards?

We are hoping to have an output that would look like:

dataframe  model     A_est   k_est  t0_est
df1        nls.mon   250     10     0.14   
df1        nls.von   350     12     0.13   
df2        nls.mon   150     11     0.15   
df2        nls.von   240     14     0.16
df3        nls.mon   220     11     0.11   
df3        nls.von   450     15     0.10

We are thinking of using an index - something like for (i in dataframe) to have it run through each dataframe, and
dataframe[i,] <- row_i to append each row after?

But, maybe there is a better way?

Rather than independent objects, consider using a `list`. That is, dfList <- list(df1, df2, df3, ..)`. That way, you can use `lapply` or `map` to iterate over the list and produce your outputs. Alternatively, if your input datasets contain the same columns, you could combine them in a single data frame and the `group_by` dataset and use `group-map` to perform your analysis. — Limey, Apr 05 '22 at 15:48

Jakub.Novotny · Accepted Answer · 2022-04-05T20:40:15.833

1

Have you considered changing your code to a function? You can store all data.frames in a named list, then apply the function on each data.frame in the list and then collect the results.

library(tidyverse)
library(broom)

# changing your code to a function
my_function <- function(.df){
  
  nls.mon <- nls(wt~A*(1-exp(k*(t0-age))), 
                 data=.df,
                 start = list(A=253.6,k=.03348,t0=32.02158))
  
  nls.von <- nls(wt ~A*(1-(1/3)*exp(k*(t0-age)))^3, 
                 data=.df,
                 start= list(A=253.6,k=.03348,t0=32.02158))
  
  # I slightly edited this part of your code
  df <- bind_rows(
    tidy(nls.mon) %>% mutate(model = "nls.mon"),
    tidy(nls.von) %>% mutate(model = "nls.von")
  ) %>%
    select(model, term, estimate) %>%
    pivot_wider(names_from = "term", values_from = "estimate")

  return(df)
}

# reading in data, the path to the data needs to be changed
df1 <- read_csv(r"{C:\Users\novot\Downloads\sample.csv}") %>%
  select(-1)

df2 <- df1 %>%
  filter(sex == "M")

# using map to apply the created function to each member of the list
df_out <- list("df1" = df1, "df2" = df2) %>%
  map(
    ~my_function(.x)
  ) %>%
  bind_rows(.id = "dataframe")

df_out
#> # A tibble: 4 x 5
#>   dataframe model       A      k    t0
#>   <chr>     <chr>   <dbl>  <dbl> <dbl>
#> 1 df1       nls.mon  248. 0.0135  2.09
#> 2 df1       nls.von  246. 0.0222 32.9 
#> 3 df2       nls.mon  248. 0.0135  2.09
#> 4 df2       nls.von  246. 0.0222 32.9

edited Apr 05 '22 at 20:40

answered Apr 05 '22 at 15:48

Jakub.Novotny

2,912
2
6
21

We are unable to share the data, but I clarified how the datasets looked in the question. – Blundering Ecologist Apr 05 '22 at 15:55
@BlunderingEcologist it is certainly an improvement although I still do not know what tha values of, for example, `B` and `mx` are so I cannot reproduce your code. – Jakub.Novotny Apr 05 '22 at 16:09
Those values are in the loop itself `start = list(B=120, k=.02, mx=30, my= 82.06))` – Blundering Ecologist Apr 05 '22 at 16:18
@BlunderingEcologist I see. Nevertheless, I still cannot run your code to obtain any results. Since I am not familiar with nls, it is hard to help you. – Jakub.Novotny Apr 05 '22 at 16:26
Oh, I was really hoping that the general structure of the loop wouldn't matter/depend on running the `nls` model because I'd need to post all the data for it to run (which I can't share for privacy reasons). – Blundering Ecologist Apr 05 '22 at 16:33
@BlunderingEcologist When I create myself the df1 according to your example, I still cannot run your code. It throws an error. You are right that the real data does not need to be shared, but a reproducible example that is simple and sufficiently demonstrate the problem at hand would be helpful. – Jakub.Novotny Apr 05 '22 at 16:35
@BlunderingEcologist I edited your example to contain the `df1`. It cannot be run because it throws an error. I would like to kindly ask you to amend the example so that it runs without an error. – Jakub.Novotny Apr 05 '22 at 16:43
Attached a sample of the data that will allow it to run. I also edited the code/question to be more clear about what we are doing with the loop – Blundering Ecologist Apr 05 '22 at 17:01
@BlunderingEcologist I have changed my answer to fit your data. Now it seems to be working fine and achieves what you described. – Jakub.Novotny Apr 05 '22 at 20:41

score 1 · Answer 2 · answered Apr 05 '22 at 23:38

You can wrap your code in a function, and apply the function to a list of your dataframes

f <- function(n,dfs) {
  nls.aad_arc <- nls(wt~ B*atan(k*(age - mx)) + my, data=dfs[[n]],start = list(B=120, k=.02, mx=30, my= 82.06)) 
  aad_arc_est <- data.frame(tidy(nls.aad_arc)) 
  data.frame(
    dataframe = n,
    aad_arc_B_est  <- as.numeric(aad_arc_est[1, "estimate"]), 
    aad_arc_k_est  <- as.numeric(aad_arc_est[2, "estimate"]),
    aad_arc_mx_est <- as.numeric(aad_arc_est[3, "estimate"])  
  )
}

dfs = list('df1'=df1, 'df2'=df2,'df3' = df3)
do.call(rbind, lapply(names(dfs), function(x) f(x,dfs)))

score 0 · Answer 3 · edited Jan 24 '23 at 17:08

The solution that ended up working best for us was this one (it's a bit long, but also pretty flexible/user friendly).

We did it for more than just the initial parameters (A, t0, k), but it can be applied to other equations besides the arctangent function.

#Dataframe and variable names 
dfs      <- list(aad, fad, mad, fnm, fma, mnm, mma) #List of dataframes for analysis 
df_names <- list('aad', 'fad', 'mad', 'fnm', 'fma', 'mnm', 'mma') #list of dataframe names
p_names  <- list('df', 'mod', 'A_val', 'A_val_se', 'B_val', 'B_val_se', 'k_val', 'k_val_se', 't0_val', 't0_val_se', 'm_val', 'm_val_se', 'mx_val', 'mx_val_se', 'my_val', 'my_val_se', 'ran_A_max', 'ran_A_min', 'ran_B_max', 'ran_B_min', 'ran_k_max', 'ran_k_min', 'ran_t0_max', 'ran_t0_min', 'ran_m_max', 'ran_m_min', 'ran_mx_max', 'ran_mx_min', 'ran_my_max', 'ran_my_min') #List of parameters and values
m_names  <- list('arc') #List of model names 
df_num   <- length(dfs) #Number of dataframes
var_num  <- length(p_names) #Number of variables in our table  
mod_num  <- length(m_names) #Number of growth models


#nlme direct output variable names 
arc_names <- list("arc_aad", "arc_fad", "arc_mad", "arc_fnm", "arc_fma", "arc_mnm", "arc_mma")   #list of variable names for nlme.arc outputs


#Creates blank dataframe 
s_pmod <- data.frame(matrix(ncol = var_num, nrow = df_num*mod_num)) #Sets dataframe size from number of models, parameters, and datasets  
colnames(s_pmod) <- p_names #Sets column names 
s_pmod #View blank dataframe 


#Creates dataframe for nlme.arc outputs
nlme_arc.out  <- data.frame(matrix(ncol = df_num, nrow = 1)) #Sets dataframe size from number of models, parameters, and datasets  
colnames(s_pmod) <- p_names #Sets column names 


#Runs each model over all dataframes and appends values into s_pmod    
for (i in 1:df_num) {    
#print(dfs[[i]]) #View full dataframes  


#Arctangent     
#nls fit for determining best initial guess for nlme model 
nls.arc <- nls(wt~ B*atan(k*(age - mx)) + my, #Function
    data = dfs[[i]], #Input dataframe 
    start = list(B = 120, k = .02, mx = 30, my= 82.2)) #Initial guesses  

arc_est <- data.frame(tidy(nls.arc)) #Converts nls output into dataframe
arc_A_est  <- as.numeric(arc_est[1, "estimate"]) #Extracts parameters from dataframe for initial guesses
arc_k_est  <- as.numeric(arc_est[2, "estimate"])
arc_mx_est <- as.numeric(arc_est[3, "estimate"])
arc_my_est <- as.numeric(arc_est[4, "estimate"]) 

#non linear least squares fit
nlme.arc <- nlme(wt ~ (B*atan(k*(age - mx)) + my), #Function
    data=dfs[[i]], #Input dataset 
    fixed=B+k+mx+my~1, #Fixed effects
    random=list(squirrel_id = pdDiag(B+k+mx+my~1)), #pdDiag specifies random affects are uncorrilated  
    start=c(B=arc_A_est, k=arc_k_est, mx=arc_mx_est, my=arc_my_est), #Initial guesses from nls calcuations
    na.action=na.omit, #Omit any NA values
    control=nlmeControl(maxIter=200, pnlsMaxIter=10, msMaxIter=100)) #Maximum number of iterations before determined divergent

#Assign nlme.arc outputs to a unique variable name
assign(arc_names[[i]], nlme.arc)   

#Extracts and define parameters and from nlme output and appends into s_pmod
arc_out <- data.frame(tidy(nlme.arc)) #Creates df with nlme model outputs
s_pmod[(1+mod_num*(i-1)), 'B_val']  <- as.numeric(arc_out[1, "estimate"]) #Extracts individual model outputs as numerics 
s_pmod[(1+mod_num*(i-1)), 'k_val']  <- as.numeric(arc_out[2, "estimate"])  
s_pmod[(1+mod_num*(i-1)), 'mx_val'] <- as.numeric(arc_out[3, "estimate"])
s_pmod[(1+mod_num*(i-1)), 'my_val'] <- as.numeric(arc_out[4, "estimate"])
s_pmod[(1+mod_num*(i-1)), 'A_val']  <- ((as.numeric(arc_out[1, "estimate"])*pi)/2)+ as.numeric(arc_out[4, "estimate"]) #Calculates and appends upper asymptote for arctangent

#Appends parameter standard error measurements into s_pmod
arc_out <- data.frame(tidy(nlme.arc)) #Creates df with nlme model outputs
s_pmod[(1+mod_num*(i-1)), 'B_val_se']  <- as.numeric(arc_out[1, "std.error"]) #Extracts parameter standard error and appends into s_pmod  
s_pmod[(1+mod_num*(i-1)), 'k_val_se']  <- as.numeric(arc_out[2, "std.error"])  
s_pmod[(1+mod_num*(i-1)), 'mx_val_se'] <- as.numeric(arc_out[3, "std.error"])
s_pmod[(1+mod_num*(i-1)), 'my_val_se'] <- as.numeric(arc_out[4, "std.error"])
s_pmod[(1+mod_num*(i-1)), 'A_val_se']  <- (as.numeric(arc_out[1, "std.error"])*pi)/2 #Calculates and appends standard error of parameter A

#Extracts random effects and appends values into s_pmod 
arc_ran <- ranef(nlme.arc) #Extracts all random effects in the dataframe 
s_pmod[(1+mod_num*(i-1)), 'ran_B_max']  <- max(arc_ran$B) #Extracts and appends maximum random effect for parameter
s_pmod[(1+mod_num*(i-1)), 'ran_B_min']  <- min(arc_ran$B) #Extracts and appends minimum random effect for parameter 
s_pmod[(1+mod_num*(i-1)), 'ran_k_max']  <- max(arc_ran$k)
s_pmod[(1+mod_num*(i-1)), 'ran_k_min']  <- min(arc_ran$k)
s_pmod[(1+mod_num*(i-1)), 'ran_mx_max'] <- max(arc_ran$mx)
s_pmod[(1+mod_num*(i-1)), 'ran_mx_min'] <- min(arc_ran$mx)
s_pmod[(1+mod_num*(i-1)), 'ran_my_max'] <- max(arc_ran$my)
s_pmod[(1+mod_num*(i-1)), 'ran_my_min'] <- min(arc_ran$my)

s_pmod[(1+mod_num*(i-1)), 'df']         <- df_names[i] #Appends dataframe name


}#End of loop

Which gives this output:

s_pmod %>% filter(mod=="arc")
   df mod    A_val  A_val_se    B_val  B_val_se      k_val     k_val_se t0_val t0_val_se m_val m_val_se   mx_val mx_val_se   my_val my_val_se ran_A_max ran_A_min ran_B_max ran_B_min   ran_k_max
1 aad arc 253.9916 0.7015467 107.1339 0.4466185 0.02322817 0.0001367976     NA        NA    NA       NA 38.51732 0.2957660 85.70604 0.5693539        NA        NA  33.14024 -31.18713 0.005225140
2 fad arc 260.7412 1.1965906 144.9240 0.7617732 0.01653624 0.0001315679     NA        NA    NA       NA 13.69431 0.4264706 33.09517 1.0042012        NA        NA  39.99318 -30.58696 0.011562276
3 mad arc 259.7367 0.8902014 105.3544 0.5667198 0.02458354 0.0002119215     NA        NA    NA       NA 42.28176 0.3671864 94.24638 0.7103262        NA        NA  19.11504 -31.07500 0.005268441
4 fnm arc 250.3786 1.1041547 105.6588 0.7029267 0.02316940 0.0002086655     NA        NA    NA       NA 38.23024 0.4760433 84.41015 0.9010855        NA        NA  22.95082 -20.57937 0.003980173
5 fma arc 260.0539 2.0168075 148.5924 1.2839396 0.01517521 0.0002634139     NA        NA    NA       NA 10.76892 0.7676714 26.64548 1.7216544        NA        NA  40.69058 -28.85547 0.011894187
6 mnm arc 260.5745 1.0445654 104.6731 0.6649910 0.02487617 0.0002508612     NA        NA    NA       NA 43.30369 0.4131035 96.15442 0.7882765        NA        NA  18.67611 -33.10133 0.005059236
7 mma arc 258.7071 1.6936251 107.2850 1.0781952 0.02405637 0.0003971414     NA        NA    NA       NA 40.17088 0.7339977 90.18416 1.4159323        NA        NA  19.59260 -18.45392 0.005446648
     ran_k_min ran_t0_max ran_t0_min ran_m_max ran_m_min      ran_mx_max       ran_mx_min      ran_my_max      ran_my_min
1 -0.007378940         NA         NA        NA        NA 0.0000105015210 -0.0000150388369 21.817420360305 -19.49415109139
2 -0.012402311         NA         NA        NA        NA 0.0000024081791 -0.0000027762636  0.000001706597  -0.00000156346
3 -0.009515597         NA         NA        NA        NA 1.6232147679678 -2.1681922266485 24.362426128753 -24.18797605980
4 -0.004196356         NA         NA        NA        NA 0.0000244818069 -0.0000307554261 15.925214720426 -13.92832902269
5 -0.011169350         NA         NA        NA        NA 0.0000007861126 -0.0000006581591  0.000124799561  -0.00012778252
6 -0.009676215         NA         NA        NA        NA 5.8551715194943 -6.5436633258442 18.513325143935 -22.03287338292
7 -0.005949639         NA         NA        NA        NA 0.0000019910474 -0.0000021254555 20.710967774449 -13.82431195267

Note that you need to create a blank dataframe first before running the loops.

loop to run through multiple datasets and save each output in R

3 Answers3

Linked