Data Munging Challenge. How do I join the correct coefficients to the correct observation in a summarized table

Question

Before I start, the a basic answer to this question can be found here: Correctly binding coefficients to summarized table

This question is different in the fact that I need to correctly join the correct coefficients to the correct position in the summary table based on where a knot is placed. I use the I(pmax(0, variable - knot)) technique to place my splines. The end result is a table of unique values of each variable, a summarized measure and the correct model statistics (see my final (yet unfinished) table in below example code).

library(tidyverse)
library(broom)

#pull in and gather data
mtcars1 <- as_tibble(mtcars)
mtcars1$cyl <- as.factor(mtcars$cyl)
#run model and produce model-summary table
model <- glm(mpg ~ cyl + hp + I(pmax(0, hp - 100)), data = mtcars1)

model_summary <- tidy(model)

#produce final summary table
summary_table <- mtcars1 %>%
  select(cyl, hp, wt) %>%
  gather(key = variable, level, - wt) %>%
  group_by(variable, level) %>%
  summarise("sum_wt" = sum(wt)) %>%
  mutate(term = paste0(variable, level)) %>%
  left_join(model_summary, by = c("term" = "term"))

The challenge is taking the I(pmax(0, hp -100)) term in the model_summary table and correctly join the estimate, std.error, statistic and p.value to each hp observation in the summary_table that is <= 100, in addition to joining the other hp estimate statistics to the hp observation in the summary_table that is > 100.

Perhaps look at `fuzzyjoin` https://www.rdocumentation.org/packages/fuzzyjoin/versions/0.1.3 defining a custom join function — Andrew Lavers, Apr 05 '18 at 13:10
`hp` is not a categorical variable that is why the summary will not include summaries for each observation in `hp`. Also, use `full_join` instead of `left_join` since the latter will drop unmatched rows. — hpesoj626, Apr 05 '18 at 13:16
Yes @hpesoj626. There lies the challenge. I need to match the correct numerically based coefficient to a categorical observation. There is a specific visual output required that needs the data this way. I currently use a proprietary software that does this but advocating for R to replace it. — Jordan, Apr 05 '18 at 13:17
No I wouldn't say the predicted values. We use the estimates (or the exponent of them) to judge where each level of a variable is relative the others. If we're happy with them, then it's done, if not, we'll tweek the knots or add/drop variables until we are. — Jordan, Apr 05 '18 at 13:34
Hi @rawr. The table, `summary_table` produced in the code would be the output, only with the correct coefficients from the `model_summary` table instead of `null` values. — Jordan, Apr 05 '18 at 16:51
yeah but you would have two hp? one for `hp` and one for `I(hp)`? Now the summary table has a row for each unique hp, but it also sounds like you want them all filled in with something — rawr, Apr 05 '18 at 16:55
My apologies @rawr. The I(hp) model stats should go to all hp levels in the `summary_table` that is less or equal to 100. The rest of the hp should have the hp values in them. Does that make sense? — Jordan, Apr 05 '18 at 16:57

Data Munging Challenge. How do I join the correct coefficients to the correct observation in a summarized table

0 Answers0