Pretty summaries for statistical models

Question

I am looking for a pretty way to see the statistical model summaries in R. In the following example, I want to see cyl_6 or cyl.6 instead of cyl6. How can I do that?

library(dplyr)
library(broom)

mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
  mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars$cyl <- as.factor(mtcars$cyl)

model <-
mtcars %>%
  select (cyl,vs, am, mpg_cat) %>%

  glm(formula = mpg_cat ~ .,
      data = ., family = "binomial")


tidy(model)

This doesn't really have anything to do with "tidy" or "pretty" statistical summaries. You are simply trying to manipulate character strings. The easiest way to do that is using regular expressions. You should be able to adapt [this](https://stackoverflow.com/questions/56131210/regex-for-adding-underscore-before-capitalized-letters) to your needs, for example `sub("(\\d+)", "_\\1", term)`. — , Jul 26 '19 at 06:24

Marius · Accepted Answer · 2019-07-25T05:49:30.290

I can think of one way to do this but it's pretty clunky: change the contrasts attribute for cyl (and any other factors you want to include) before running the model:

mtcars$cyl <- as.factor(mtcars$cyl)
cont = contrasts(mtcars$cyl)
colnames(cont) = paste0("_", colnames(cont))
contrasts(mtcars$cyl) = cont

model <-
    mtcars %>%
    select (cyl,vs, am, mpg_cat) %>%

    glm(formula = mpg_cat ~ .,
        data = ., family = "binomial")

tidy(model)

Output:

# A tibble: 5 x 5
  term        estimate std.error  statistic p.value
  <chr>          <dbl>     <dbl>      <dbl>   <dbl>
1 (Intercept)   22.9      24034.  0.000953    0.999
2 cyl_6        -22.4      12326. -0.00182     0.999
3 cyl_8        -44.5      23246. -0.00191     0.998
4 vs            -1.59     13641. -0.000117    1.000
5 am             0.201    13641.  0.0000147   1.000

If you wanted this behaviour by default, I guess you could write a modified version of contr.treatment that sets the column names how you want and then set that as the default with options(contrasts = ...)? I haven't tested if that works.

I'm not sure there is a good way - I don't know there is an elegant way to set attributes as part of a `dplyr` chain, and you need to set an attribute on the `cyl` column. See https://stackoverflow.com/questions/25662859/adding-attributes-in-chaining-way-in-dplyr-package for ideas. — Marius, Jul 26 '19 at 03:30
Thanks. My 2nd question is: how can I write a function so that I can do it for my factor variables? — Hamideh, Jul 26 '19 at 03:37

score 0 · Answer 2 · answered Jul 25 '19 at 05:39

Just use sub, for instance, in a pipe.
I start by simplifying the model code.

model <-
  mtcars %>%
  mutate(mpg_cat = as.integer(mpg > mean(mpg)),
         cyl = factor(cyl)) %>%
  select (cyl,vs, am, mpg_cat) %>%
  glm(formula = mpg_cat ~ .,
      data = ., family = "binomial")

Now it's a matter of applying a regex:

"^cyl" matches "cyl" at the beginning of the string.

And the pipe would be

model %>%
  tidy() %>%
  mutate(term = sub("^cyl", "cyl_", term))
## A tibble: 5 x 5
#  term        estimate std.error  statistic p.value
#  <chr>          <dbl>     <dbl>      <dbl>   <dbl>
#1 (Intercept)   22.9      24034.  0.000953    0.999
#2 cyl_6        -22.4      12326. -0.00182     0.999
#3 cyl_8        -44.5      23246. -0.00191     0.998
#4 vs            -1.59     13641. -0.000117    1.000
#5 am             0.201    13641.  0.0000147   1.000

Makes sense but in my real model, I have around 20 variable-level strings and I don't want to manipulate the term column for each call of the glm function. — Hamideh, Jul 25 '19 at 05:53

Pretty summaries for statistical models

2 Answers2