-1

I'm a graduate student using a linear regression (count) model to understand drivers of fish movement into and out of tidal wetlands. I am currently trying to generate a publication-worthy model summary table in r. I've been using the sel.table function which has been working well for this purpose.

However, I've been unable to generate a column that contains the individual model formulas. Below is my code which is based off of some nice instructions for using the MuMIn package. https://sites.google.com/site/rforfishandwildlifegrads/home/mumin_usage_examples

So to recap, my question pertains to the last line of code below,

How can I insert model formulas into a model selection table.**

install.packages("MuMIn")
library(MuMIn)

data = mtcars

models = list(
  model1 <- lm(mpg ~ cyl, data = data),
  model2 <- lm(mpg ~ cyl + hp, data = data),
  model3 <- lm(mpg ~ cyl * hp, data = data)
)

#create an object “out.put” that contains all of the model selection information
out.put <- model.sel(models)

#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)[6:10]

#add a column for model names
sel.table$Model <- rownames(sel.table)

#replace model name with formulas
for(i in 1:nrow(sel.table)) sel.table$Model[i]<- as.character(formula(paste(sel.table$Model[i])))[3]

#Any help on this topic would be greatly appreciated!

UPDATED CODE

My method of pulling out model names is pretty clunky but otherwise this code seems to generate what I intended (a complete model selection table). Also, I'm not sure if the model coefficients are displayed properly but I hope to follow up on this for my final answer.

data = mtcars

#write linear models
models = list(
  model1 <- lm(mpg ~ cyl, data = data),
  model2 <- lm(mpg ~ cyl + hp, data = data),
  model3 <- lm(mpg ~ cyl * hp + disp, data = data),
  model4 <- lm(mpg ~ cyl * hp + disp + wt + drat, data = data)
)

#create an object “out.put” that contains all of the model selection information
out.put <- model.sel(models)

#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)

#slightly rename intercept column
names(sel.table)[1]="Intercept"

#select variables to display in model summary table
sel.table <- sel.table %>% 
 select(Intercept,cyl,hp,disp,wt,drat,df,logLik,AICc,delta)

#round numerical coumns
sel.table[,1:6]<- round(sel.table[,1:6],2)

sel.table[,8:10]<-round(sel.table[,8:10],2)

#add a column for model (row) names
sel.table$Model <- rownames(sel.table)

#extract model formulas
form <- data.frame(name = as.character(lapply(models, `[[`, c(10,2))))

#generate a column with model (row) numbers (beside associated model formulas)
form <- form %>% 
  mutate(Model=(1:4))

#merge model table and model formulas
sum_table <- merge (form,sel.table,by="Model")

#rename model equation column 
names(sum_table)[2]="Formula"

print <- flextable(head(sum_table))
print <- autofit(print)
print

6/1/20 UPDATE:

Below is an image that describes two issues that I'm having with the code. I've found a workaround to the first question but I'm still investigating the second. see details here

  1. Models end up being misnumbered
  2. Model formula columns are being generated for each model
Dave A
  • 1
  • 1
  • welcome to stackoverflow. I'm afraid I don't understand your question. What are you trying to do exaclty? What is `out.put` ? Are all the nine models relevant in your question? If not feel free to remove all redundant information. – DJJ May 28 '20 at 21:03
  • I am trying to generate a column of model formulas (i.e. Diel_phase*Stage_Diurnal_Low for model 3) for my model summary table. – Dave A May 28 '20 at 23:01
  • Please add the packages you use with this code. Where does the `%>%` operator come from? – Kamil Bartoń Jun 01 '20 at 11:28

3 Answers3

1

I believe there is a part of the code missing in the examples you followed, that is why your code does not work.

The easiest way to generate formula-like strings is simply to deparse the right hand side of the model formulas (i.e. 3-rd element):

sapply(get.models(out.put, TRUE), function(mo) deparse(formula(mo)[[3]], width.cutoff = 500))

or, if you want A*B's expanded into A + B + A:B:

sapply(get.models(out.put, TRUE), function(mo) deparse(terms(formula(mo), simplify = TRUE)[[3]], width.cutoff = 500))

Update: the original example code improved and simplified:

library(MuMIn)

data <- mtcars

#! Feed the models directly to `model.sel`. No need to create a separate list of
#! models.
gm <- lm(mpg ~ cyl, data = data)
out.put <- model.sel(
  model1 = gm,
  model2 = update(gm, . ~. + hp),
  model3 = update(gm, . ~ . * hp + disp),
  model4 = update(gm, . ~ . * hp + disp + wt + drat)
  )

sel.table <- out.put
sel.table$family <- NULL
sel.table <- round(sel.table, 2)
#! Use `get.models` to get the list of models in the same order as in the
#! selection table
sel.table <- cbind(
    Model = 
#! Update (2): model number according to their original order, use:
       attr(out.put, "order"),
#!     otherwise: seq(nrow(sel.table)),
#!
#! Update (2): add a large `width.cutoff` to `deparse` so that the result is
#!         always a single string and `sapply` returns a character vector
#!         rather than a list.
#!         For oversize formulas, use `paste0(deparse(...), collapse = "")`  
    formula = sapply(get.models(out.put, TRUE),
        function(mo) deparse(formula(mo)[[3]], width.cutoff = 500)),
#!
    sel.table
    )
Kamil Bartoń
  • 1,482
  • 9
  • 10
  • Thanks for the clarification on that Kamil. It's really useful to know how to pull out all of that information! I ended up retrieving the same level of information that's included in your first suggestion, although using a slightly different method. In my updated code...do you know if the coefficients are displayed properly with respect to the model formulas (i.e. models 3 and 4 with the * operator)? Also, I realize my way of numbering the models is pretty clunky but I wasn't able to pull "model1" etc out of the model objects. – Dave A May 31 '20 at 18:40
  • @DaveA Extracting object items by numeric indices is a straight way to run into trouble. Please unlearn it. Unless you have created the object yourself and you are certain the order of items is fixed, do not use numeric indices. The best practice is to use extractor functions like `getCall` (equivalent to `model[[10]]` in **this particular case**), for some classes `model$call` will work as well. So, `getCall(model)$formula` may work in most cases, but that's still an overly complex code. `formula(model)` simply does the trick. – Kamil Bartoń Jun 01 '20 at 11:26
  • @DaveA I've added the full code from your example with my modifications. – Kamil Bartoń Jun 01 '20 at 14:42
0
library(MuMIn)

data <- mtcars

#! Feed the models directly to `model.sel`. No need to create a separate list of
#! models.
gm <- lm(mpg ~ cyl, data = data)
out.put <- model.sel(
  model1 = gm,
  model2 = update(gm, . ~. + hp),
  model3 = update(gm, . ~ . * hp + disp),
  model4 = update(gm, . ~ . * hp + disp + wt + drat)
)

sel.table <- out.put
sel.table$family <- NULL
sel.table <- round(sel.table, 2)
#! Use `get.models` to get the list of models in the same order as in the
sel.table <- cbind(
  Model = 
    #! Update (2): model number according to their original order, use:
    attr(out.put, "order"),
  #!     otherwise: seq(nrow(sel.table)),
  #!
  #! Update (2): add a large `width.cutoff` to `deparse` so that the result is
  #!         always a single string and `sapply` returns a character vector
  #!         rather than a list.
  #!         For oversize formulas, use `paste0(deparse(...), collapse = "")`  
  formula = sapply(get.models(out.put, TRUE),
                   function(mo) deparse(formula(mo)[[3]], width.cutoff = 500)),
  #!
  sel.table
)

#slightly rename intercept column  
colnames(sel.table)[3] <- 'Intercept' 

# #select summary columns for model selection table
# sel.table <- sel.table %>%
#   select(Model,formula,Intercept,df,logLik,AICc,delta,weight)

print <- flextable(head(sel.table))
print <- autofit(print)
print
Dave A
  • 1
  • 1
-1

Since your question isn't reproducible, i'll try with something else and maybe that's what you're looking for:

data = mtcars

models = list(
  model1 = lm(mpg ~ cyl, data = data),
  model2 = lm(mpg ~ cyl + hp, data = data)
  )

data.frame(name = as.character(lapply(models, `[[`, c(10,2))),
           other.column = NA)
#>             name other.column
#> 1      mpg ~ cyl           NA
#> 2 mpg ~ cyl + hp           NA

Created on 2020-05-28 by the reprex package (v0.3.0)

The formula (call) of a lm object is on position 10 of the list. You can actually count when you type model1$. You can use rownames() instead of a column, but that's not recommended.

EDIT AFTER REPRODUCIBLE EXAMPLE

library(MuMIn)

data = mtcars

models = list(
  model1 <- lm(mpg ~ cyl, data = data),
  model2 <- lm(mpg ~ cyl + hp, data = data),
  model3 <- lm(mpg ~ cyl * hp, data = data)
)

# create an object that contains all of the model selection information
out.put <- model.sel(models)

#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)[6:10]

# formulas as names
sel.table$name = as.character(lapply(models, `[[`, c(10,2)))

# reordering
sel.table = sel.table[, c(6,1,2,3,4,5)]

sel.table
#>             name df    logLik     AICc    delta    weight
#> 3      mpg ~ cyl  5 -78.14329 168.5943 0.000000 0.5713716
#> 1 mpg ~ cyl + hp  3 -81.65321 170.1636 1.569298 0.2607054
#> 2 mpg ~ cyl * hp  4 -80.78092 171.0433 2.449068 0.1679230

Created on 2020-05-31 by the reprex package (v0.3.0)

Alberson Miranda
  • 1,248
  • 7
  • 25
  • Thanks Alberson. I was able to get your code to work but unfortunately it's not quite what I was looking for. I've edited my original post so that the code is reproducable. Ideally, I would be able to simply add a column to the sel.table data frame that contains model formulas. Thanks in advance to anyone who might have additional insight into this topic! – Dave A May 29 '20 at 15:29
  • Not reproducible if you don't tell where does `model.sel()` comes from. To be clear, show your `library()` calls. – Alberson Miranda May 29 '20 at 18:32
  • Shoot, sorry! I've updated my code (at the top of the page) to include a call to MuMIn, the package model.sel is found in. – Dave A May 29 '20 at 23:43
  • After your edit i was able to reproduce your data and my previous solution worked in the very same way. Check my edit. – Alberson Miranda May 31 '20 at 03:51
  • Wow, that's an incredibly efficient solution. Thanks so much for this and your earlier response! I have many colleagues who use Excel as a workaround to list model formulas. I'll be sure to share your answer to this vexing aspect of modeling. – Dave A May 31 '20 at 05:50
  • @AlbersonMiranda @DaveA There are two serious problems in this code: (1) the order of items in `models` is different than in the table, so the names are wrongly assigned. (2) Extracting the formula with `model[[c(10, 2)]]` is a really bad idea. Use `formula(model)` instead. – Kamil Bartoń Jun 01 '20 at 14:46
  • Ok, thanks so much for pointing out these issues @KamilBartoń. Perhaps I'm missing something but it seems like the code you shared yesterday still leads to an error in model names. This bit of code seems to work but perhaps there are issues with it that I'm not aware of. #add a column for model (row) names sel.table$Model <- rownames(sel.table) – Dave A Jun 01 '20 at 16:53
  • @KamilBartoń I've appended the details to my original post along with a picture to more clearly describe the errors I've experienced. – Dave A Jun 02 '20 at 05:39
  • I suspect it is due to long formulas being cut into smaller strings by `deparse` so the result is a `list` rather than a character vector, and in turn gets distributed over columns by `cbind`. See my updated answer for a solution. – Kamil Bartoń Jun 02 '20 at 11:48
  • Oh, thanks @KamilBartoń. It's the first time i use your package. It seems `model.sel()` ranks *and* reorders the models by AICc (default). So using `formula()` instead of [[ wouldn't fix that. May i suggest a reorder argument (that could be set to FALSE)? Solution seems overly complicated for a otherwise simple task. – Alberson Miranda Jun 02 '20 at 15:46
  • @AlbersonMiranda ordering the models is the purpose of `model.sel`. The accessor function `get.models` does retrieve models in the correct order, I don't think it is overly complex. – Kamil Bartoń Jun 03 '20 at 14:42
  • @KamilBartoń and Alberson Miranda many thanks to both of you for your help in answering my question. It is very much appreciated and I am enjoying using this code for my model selection process. Thanks for making the code so concise and effective KamilBartoń! – Dave A Jun 08 '20 at 13:14