GLM Parse Categorical Coefficients

Question

I'm trying to create tidy data and I am attempting separating a charter value from the field name

#Example: 

data(mtcars)
library(broom)

#Adding some new character variables
mtcars1 <- mtcars
mtcars1$has_leter_yn <- ifelse(grepl("[[:digit:]]"
                                 , rownames(mtcars))==TRUE, "1 Y", "2 N")

mtcars1$first_letter <- substr(rownames(mtcars), 1,1)

mtcars1$cyl_yn <- ifelse(mtcars$cyl > 5, "Y", "N") 

mtcars1$am_yn <- ifelse(mtcars$am > 0.5, "N", "Y") 

mtcars1$hp_yn <- ifelse(mtcars$hp > 200, "POWER", "WEAK") 

#model 
mod <- glm(mpg ~ wt + first_letter + has_leter_yn + cyl_yn +am_yn + hp_yn
       , data =  mtcars1)

#broom tidy function
tidy(mod)

             term    estimate std.error   statistic      p.value
1      (Intercept) 25.46529192  5.155178  4.93974994 0.0001243241
2               wt -2.38861746  1.338905 -1.78400876 0.0922815327
3    first_letterC  4.63900549  3.751079  1.23671244 0.2330073046
4    first_letterD  0.95497914  3.332624  0.28655476 0.7779162451
5    first_letterF  3.78125890  3.337474  1.13297017 0.2729534747
6    first_letterH  4.74971469  3.163074  1.50161363 0.1515430178
7    first_letterL  4.21825272  3.570943  1.18127128 0.2537575961
8    first_letterM  2.81979616  3.149218  0.89539568 0.3830789592
9    first_letterP  3.44802708  3.445036  1.00086826 0.3309248121
10   first_letterT  4.24503396  3.581256  1.18534795 0.2521854417
11   first_letterV  0.90257581  3.474959  0.25973711 0.7981860095
12 has_leter_yn2 N  0.06314099  1.394756  0.04527028 0.9644194087
13         cyl_ynY -4.51802483  1.637415 -2.75924279 0.0134056327
14          am_ynY -1.33513554  1.827695 -0.73050238 0.4750310328
15       hp_ynWEAK  3.72962845  2.042696  1.82583649 0.0854925603

Is there a way to separate first_letter and C?

I would like to use the estimate, term, & character in a data frame for future use. Any help would be appreciated!

What do you mean? To only get "C" from `first_letterC` of the coefficient estimate names? — Roman Luštrik, May 04 '17 at 17:26
I want two columns. One with first_letter & one with C (and so on for each term) I'm not sure the term to describe it. — Ryan John, May 04 '17 at 17:30
Do you mean to have two columns in the output of the regression? Can you show what the final output should look like? — Roman Luštrik, May 04 '17 at 17:45

score 1 · Answer 1 · answered May 04 '17 at 17:57

1

Anything like this?

xy <- tidy(mod)

data.frame(letter = gsub(pattern = "^(*.|first_letter)([A-Z])", replacement = "\\2", x = xy$term),
           prepend = gsub(pattern = "^(*.|first_letter)([A-Z])", replacement = "\\1", x = xy$term),
           oldterm = xy$term)

            letter         prepend         oldterm
1       Intercept)      (ntercept)     (Intercept)
2               wt              wt              wt
3                C    first_letter   first_letterC
4                D    first_letter   first_letterD
5                F    first_letter   first_letterF
6                H    first_letter   first_letterH
7                L    first_letter   first_letterL
8                M    first_letter   first_letterM
9                P    first_letter   first_letterP
10               T    first_letter   first_letterT
11               V    first_letter   first_letterV
12 has_leter_yn2 N has_leter_yn2 N has_leter_yn2 N
13         cyl_ynY         cyl_ynY         cyl_ynY
14          am_ynY          am_ynY          am_ynY
15       hp_ynWEAK       hp_ynWEAK       hp_ynWEAK

answered May 04 '17 at 17:57

Roman Luštrik

69,533
24
154
197

That's really close! I'm trying use this for the other variables like hp_yn or has_leter_yn etc. This is a toy example – Ryan John May 04 '17 at 18:08
@RyanJohn so it should work for all variables - to separate the last capital level letter (level) from the variable name? – Roman Luštrik May 04 '17 at 18:17
Is there a method to separate a number that's part of the catagory ie: catagorical_var: 01:level, 02:level, 03:level ? – Ryan John May 04 '17 at 18:21
@RyanJohn I think it's best to talk in terms of (a reproducible) examples. Edit your question and make it as general as possible because things don't always scale in case of special cases. Perhaps you could also specify what your end game is. There could be an easier way. – Roman Luštrik May 04 '17 at 18:24
You're right - I need a better example. Thanks for your help - it's much appreciated! – Ryan John May 04 '17 at 18:46

GLM Parse Categorical Coefficients

1 Answers1