RSTUDIO finding p values and R-squared for each subsample

Question

Hi I am very new to R and to this forum.

I want to run multiple regressions on subsamples from a large dataset. Here is a sample of my dataset named "totaldoc": sample dataset image

I want to do lm(numericdiffNGO∼numericdiffmeeting)) for each issue_name1.

I tried this command :

 lapply(split(totaldoc, f = list(totaldoc$issue_name1)), function(x) lm(numericdiffNGO∼numericdiffmeeting))

and this command

ddply(totaldoc, "issue_name1", function(df)coefficients (lm(numericdiffNGO∼numericdiffmeeting, data=df)))

But it only give me the coefficients and even not for all the issu-name1 What I want to do is to have each p value per subsamples issu-name1 and to rank them from the most significant to the highest. And the same for rsquared but for the reverse so, the highest to the lowest.

Welcome to SO! Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. — r2evans, Apr 22 '22 at 16:14

score 0 · Answer 1 · answered Apr 23 '22 at 17:55

0

Here's a stab using mtcars:

library(dplyr)
mtcars %>%
  group_nest(cyl) %>%
  mutate(
    model = lapply(data, function(z) lm(mpg ~ disp, data = z)), 
    summ = lapply(model, summary), 
    p.value = sapply(summ, function(z) coef(z)[2,"Pr(>|t|)"]), 
    rsq = sapply(summ, `[[`, "r.squared")
  ) %>%
  arrange(-p.value)
# # A tibble: 3 x 6
#     cyl                data model  summ       p.value    rsq
#   <dbl> <list<tibble[,10]>> <list> <list>       <dbl>  <dbl>
# 1     6            [7 x 10] <lm>   <smmry.lm> 0.826   0.0106
# 2     8           [14 x 10] <lm>   <smmry.lm> 0.0568  0.270 
# 3     4           [11 x 10] <lm>   <smmry.lm> 0.00278 0.648

answered Apr 23 '22 at 17:55

r2evans

141,215
6
77
149

Thank you so much! I try this tomorrow and keep you in touch! – RS100214162 Apr 26 '22 at 19:43
I don't get why I have this message of error Error : in `group_by()`: ! Must group by variables found in `.data`. x Column `issue_name1` is not found. Run `rlang::last_error()` to see where the error occurred. – RS100214162 Apr 30 '22 at 00:01
I don't know either, and there's nothing I can do without sample data. It works with this data, it's up to you to show me how it doesn't work with your data. – r2evans Apr 30 '22 at 00:16

RSTUDIO finding p values and R-squared for each subsample

1 Answers1