0

I have a glm model and a summarized dataset that requires I bind the coefficients, standard error and p.value from the summary of the model to the summarized dataset. For an example, I used the mtcars data set. I added columns to the final unioned data set to mimic where I would like coefficients, standard errors, and p-values to be placed. In terms of the base values that aren't shown in the model, I would like to add a "1" to coefficients and use the intercept, standard errors and p-value. How could I do all of this?

library(tidyverse)

mtcars <- as_tibble(mtcars)

mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)

#run model
model1 <- glm(mpg ~ cyl + gear, data = mtcars)
summary(model1)

#start developing summarized data set
mtcars_wght <- mtcars %>%
  group_by(cyl) %>%
  rename(level = cyl) %>%
  summarise("sum_weight" = sum(wt)) %>%
  mutate("variable" = "cyl") 

mtcars_gear <- mtcars %>%
  group_by(gear) %>%
  summarise("sum_weight" = sum(wt)) %>%
  mutate("variable" = "gear") %>%
  rename(level = gear)

#make summarized data set example
mtcars_sum <- mtcars_wght %>%
  bind_rows(mtcars_gear) %>%
  mutate("coefficient" = "x", "std.error" = "y", "p_value" = "z")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Jordan
  • 1,415
  • 3
  • 18
  • 44
  • 2
    Can you post your desired output? – hpesoj626 Apr 05 '18 at 09:24
  • Doesn't answer the question, but in the case of a gaussain link `lm` and `glm` produce the same model. Or maybe your real life use case is more complex... – Rui Barradas Apr 05 '18 at 09:37
  • Hi @hpesoj626. The final table in the code with "x", "y", and "z" variables is what I want the final table to look like. Just with the "x", "y", and "z" being replaced by the coefficients, standard errors and p-values. – Jordan Apr 05 '18 at 09:48
  • Hi @RuiBarradas. The actual model will either by quasipoisson distribution or gamma with a log link. – Jordan Apr 05 '18 at 09:50
  • You can use `broom::tidy(model1)` to get the model coefficients etc in a dataframe – Relasta Apr 05 '18 at 09:51
  • Hi @Relasta. The issue is the fact that `r` combines the variable names and variables in the summary itself which makes it tricky to join. – Jordan Apr 05 '18 at 09:52

1 Answers1

1

Are you after this output?

# A tibble: 6 x 6
  level sum_weight variable coefficient std.error    p_value
  <chr>      <dbl> <chr>          <dbl>     <dbl>      <dbl>
1 4           25.1 cyl            NA        NA    NA        
2 6           21.8 cyl            -6.66      1.63  0.000353 
3 8           56.0 cyl           -10.5       1.96  0.0000109
4 3           58.4 gear           NA        NA    NA        
5 4           31.4 gear            1.32      1.93  0.498    
6 5           13.2 gear            1.50      1.85  0.426

Here is how you can arrive at this with dplyr and broom.

df <- rbind(mtcars_wght, mtcars_gear)
df <- df %>% mutate(
  level = paste0(variable, level)
) %>% select(-variable)

mod_summary <- model1 %>% broom::tidy()

left_join(df, mod_summary, by = c('level' = 'term')) %>% 
  mutate(variable = str_extract(level, '[a-z]+'),
         level = str_extract(level, '[0-9]+')) %>%
  rename(coefficient = estimate, p_value = p.value) %>%
  select(level, sum_weight, variable, coefficient, std.error, p_value)

Edit

If you want to include the Intercept, use full_join instead of left_join above. Below, I saved the output to thesummary.

thesummary <- full_join(df, mod_summary, by = c('level' = 'term')) %>% 
  mutate(variable = str_extract(level, '[A-Za-z]+'),
         level = str_extract(level, '[0-9]+')) %>%
  rename(coefficient = estimate, p_value = p.value) %>%
  select(level, sum_weight, variable, coefficient, std.error, p_value)

To assign 1 for missing values, for the last 4 columns only, do:

cbind(thesummary[,1:2], apply(thesummary[,3:6], 2, function(x) ifelse(is.na(x), 1, x)))

Here is the output:

  level sum_weight  variable coefficient std.error   p_value
1     4      25.14       cyl           1         1         1
2     6      21.82       cyl      -6.656     1.629 3.528e-04
3     8      55.99       cyl     -10.542     1.958 1.087e-05
4     3      58.39      gear           1         1         1
5     4      31.40      gear       1.324     1.928 4.980e-01
6     5      13.16      gear       1.500     1.855 4.257e-01
7  <NA>         NA Intercept      25.428     1.881 1.554e-13

If you want to replace every NA with `, simply do:

thesummary[is.na(thesummary)] <- 1

Here is the output.

# A tibble: 7 x 6
  level sum_weight variable  coefficient std.error  p_value
  <chr>      <dbl> <chr>           <dbl>     <dbl>    <dbl>
1 4          25.1  cyl              1.00      1.00 1.00e+ 0
2 6          21.8  cyl             -6.66      1.63 3.53e- 4
3 8          56.0  cyl            -10.5       1.96 1.09e- 5
4 3          58.4  gear             1.00      1.00 1.00e+ 0
5 4          31.4  gear             1.32      1.93 4.98e- 1
6 5          13.2  gear             1.50      1.85 4.26e- 1
7 1           1.00 Intercept       25.4       1.88 1.55e-13
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
  • Thank you. Any easy way to add the intercept standard error and pvalues to the NA values and "1" to the NA coefficient values? – Jordan Apr 05 '18 at 10:02