-2

Hi I am very new to R and to this forum.

I want to run multiple regressions on subsamples from a large dataset. Here is a sample of my dataset named "totaldoc": sample dataset image

I want to do lm(numericdiffNGO∼numericdiffmeeting)) for each issue_name1.

I tried this command :

 lapply(split(totaldoc, f = list(totaldoc$issue_name1)), function(x) lm(numericdiffNGO∼numericdiffmeeting))

and this command

ddply(totaldoc, "issue_name1", function(df)coefficients (lm(numericdiffNGO∼numericdiffmeeting, data=df)))

But it only give me the coefficients and even not for all the issu-name1 What I want to do is to have each p value per subsamples issu-name1 and to rank them from the most significant to the highest. And the same for rsquared but for the reverse so, the highest to the lowest.

Oli
  • 9,766
  • 5
  • 25
  • 46
  • 3
    Welcome to SO! Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Apr 22 '22 at 16:14
  • yes I will pay attention to it next time – RS100214162 Apr 23 '22 at 15:52

1 Answers1

0

Here's a stab using mtcars:

library(dplyr)
mtcars %>%
  group_nest(cyl) %>%
  mutate(
    model = lapply(data, function(z) lm(mpg ~ disp, data = z)), 
    summ = lapply(model, summary), 
    p.value = sapply(summ, function(z) coef(z)[2,"Pr(>|t|)"]), 
    rsq = sapply(summ, `[[`, "r.squared")
  ) %>%
  arrange(-p.value)
# # A tibble: 3 x 6
#     cyl                data model  summ       p.value    rsq
#   <dbl> <list<tibble[,10]>> <list> <list>       <dbl>  <dbl>
# 1     6            [7 x 10] <lm>   <smmry.lm> 0.826   0.0106
# 2     8           [14 x 10] <lm>   <smmry.lm> 0.0568  0.270 
# 3     4           [11 x 10] <lm>   <smmry.lm> 0.00278 0.648 
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you so much! I try this tomorrow and keep you in touch! – RS100214162 Apr 26 '22 at 19:43
  • I don't get why I have this message of error Error : in `group_by()`: ! Must group by variables found in `.data`. x Column `issue_name1` is not found. Run `rlang::last_error()` to see where the error occurred. – RS100214162 Apr 30 '22 at 00:01
  • I don't know either, and there's nothing I can do without sample data. It works with this data, it's up to you to show me how it doesn't work with your data. – r2evans Apr 30 '22 at 00:16