1

I am looking for a pretty way to see the statistical model summaries in R. In the following example, I want to see cyl_6 or cyl.6 instead of cyl6. How can I do that?

library(dplyr)
library(broom)

mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
  mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars$cyl <- as.factor(mtcars$cyl)

model <-
mtcars %>%
  select (cyl,vs, am, mpg_cat) %>%

  glm(formula = mpg_cat ~ .,
      data = ., family = "binomial")


tidy(model)

enter image description here

Hamideh
  • 665
  • 2
  • 8
  • 20
  • This doesn't really have anything to do with "tidy" or "pretty" statistical summaries. You are simply trying to manipulate character strings. The easiest way to do that is using regular expressions. You should be able to adapt [this](https://stackoverflow.com/questions/56131210/regex-for-adding-underscore-before-capitalized-letters) to your needs, for example `sub("(\\d+)", "_\\1", term)`. –  Jul 26 '19 at 06:24

2 Answers2

1

I can think of one way to do this but it's pretty clunky: change the contrasts attribute for cyl (and any other factors you want to include) before running the model:

mtcars$cyl <- as.factor(mtcars$cyl)
cont = contrasts(mtcars$cyl)
colnames(cont) = paste0("_", colnames(cont))
contrasts(mtcars$cyl) = cont

model <-
    mtcars %>%
    select (cyl,vs, am, mpg_cat) %>%

    glm(formula = mpg_cat ~ .,
        data = ., family = "binomial")

tidy(model)

Output:

# A tibble: 5 x 5
  term        estimate std.error  statistic p.value
  <chr>          <dbl>     <dbl>      <dbl>   <dbl>
1 (Intercept)   22.9      24034.  0.000953    0.999
2 cyl_6        -22.4      12326. -0.00182     0.999
3 cyl_8        -44.5      23246. -0.00191     0.998
4 vs            -1.59     13641. -0.000117    1.000
5 am             0.201    13641.  0.0000147   1.000

If you wanted this behaviour by default, I guess you could write a modified version of contr.treatment that sets the column names how you want and then set that as the default with options(contrasts = ...)? I haven't tested if that works.

Marius
  • 58,213
  • 16
  • 107
  • 105
  • How can I write those three lines of code in dplyr? – Hamideh Jul 26 '19 at 03:16
  • I'm not sure there is a good way - I don't know there is an elegant way to set attributes as part of a `dplyr` chain, and you need to set an attribute on the `cyl` column. See https://stackoverflow.com/questions/25662859/adding-attributes-in-chaining-way-in-dplyr-package for ideas. – Marius Jul 26 '19 at 03:30
  • Thanks. My 2nd question is: how can I write a function so that I can do it for my factor variables? – Hamideh Jul 26 '19 at 03:37
0

Just use sub, for instance, in a pipe.
I start by simplifying the model code.

model <-
  mtcars %>%
  mutate(mpg_cat = as.integer(mpg > mean(mpg)),
         cyl = factor(cyl)) %>%
  select (cyl,vs, am, mpg_cat) %>%
  glm(formula = mpg_cat ~ .,
      data = ., family = "binomial") 

Now it's a matter of applying a regex:

  • "^cyl" matches "cyl" at the beginning of the string.

And the pipe would be

model %>%
  tidy() %>%
  mutate(term = sub("^cyl", "cyl_", term))
## A tibble: 5 x 5
#  term        estimate std.error  statistic p.value
#  <chr>          <dbl>     <dbl>      <dbl>   <dbl>
#1 (Intercept)   22.9      24034.  0.000953    0.999
#2 cyl_6        -22.4      12326. -0.00182     0.999
#3 cyl_8        -44.5      23246. -0.00191     0.998
#4 vs            -1.59     13641. -0.000117    1.000
#5 am             0.201    13641.  0.0000147   1.000
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Makes sense but in my real model, I have around 20 variable-level strings and I don't want to manipulate the term column for each call of the glm function. – Hamideh Jul 25 '19 at 05:53