0

I'm struggling to get the following code working. The data I have is a data.frame of a physical test. Athletes who did the test are classified based on a 'Sport Specific' paramater.

wingate_benchmarks <- wingate_data %>%
        select(`Sport Specific`,`Minimum power output`,`Average power output`,
           `Relative Average Power`,`Peak Power`,`Time to peak`,`Max. RPM`,`Time to Max. RPM`,`Total Work`) %>%
        group_by(`Sport Specific`) %>%
        dplyr::summarize_at(vars(`Minimum power output`,`Average power output`,
                      `Relative Average Power`,`Peak Power`,`Time to peak`,`Max. RPM`,`Time to Max. RPM`,`Total Work`),
                 list(mean = mean, sd = sqrt((n()-1)/n())*sd))

If I use only sd, it calculates the Standard Deviation as if the data is a sample, but it should be considered as the full popluation. Hence the sqrt((n()-1)/n()) addition.

But R keeps returning: Error: n() must only be used inside dplyr verbs.

Is there anyway to solve this? Thanks!

Emiel C
  • 3
  • 2
  • 2
    If you're doing more than just providing the name of the function, you need to provide a "purr-style lambda". Something like `se = ~sd(.)*sqrt((n()-1)/n()))`. Also, `summarise_at` has been superseded by `across()`. Note how to reference the base function `sd` in the lambda. – Limey Aug 13 '21 at 11:00
  • It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Aug 13 '21 at 11:47

1 Answers1

1

Here's an attempt, not certain if it will work with your data.

wingate_data %>%
  select(`Sport Specific`, `Minimum power output`, `Average power output`,
         `Relative Average Power`, `Peak Power`, `Time to peak`,
         `Max. RPM`, `Time to Max. RPM`, `Total Work`) %>%
  group_by(`Sport Specific`) %>%
  dplyr::summarize(
    across(`Minimum power output`, `Average power output`, `Relative Average Power`,
           `Peak Power`, `Time to peak`, `Max. RPM`, `Time to Max. RPM`, `Total Work`,
           list(mean = ~ mean(.), sd = ~ sqrt((n()-1)/n()) * sd(.))
           ))

We can see it in action using mtcars:

mtcars %>%
  group_by(cyl) %>%
  summarize(
    across(vs:carb,
           list(mean = ~ mean(.), sd = ~ sqrt((n()-1)/n()) * sd(.))
           ))
# # A tibble: 3 x 9
#     cyl vs_mean vs_sd am_mean am_sd gear_mean gear_sd carb_mean carb_sd
#   <dbl>   <dbl> <dbl>   <dbl> <dbl>     <dbl>   <dbl>     <dbl>   <dbl>
# 1     4   0.909 0.287   0.727 0.445      4.09   0.514      1.55   0.498
# 2     6   0.571 0.495   0.429 0.495      3.86   0.639      3.43   1.68 
# 3     8   0     0       0.143 0.350      3.29   0.700      3.5    1.5  

As @Limey said in their comment, the summarize_* functions have been superseded by across, which generally takes two arguments: the variables (in tidyselect fashion), and some form of function(s).

The functions can be provided in several ways:

  • literal functions, summarize(across(vs:carb, mean));
  • anon-funcs, summarize(across(vs:carb, function(z) mean(z)/2));
  • rlang-style tilde funcs, summarize(across(vs:carb, ~ mean(.))), where the . is replaced with the column (vector); or
  • a named-list with any of the above, such as we demonstrated in the mtcars working answer above.
r2evans
  • 141,215
  • 6
  • 77
  • 149