2

I have some older code that I am trying to rework since funs() has been depreciated (I know, I'm way behind!). I use the output this style of summarise_if gives often, but cannot get it to work with list().
Older Code:

iris_means<-iris %>% 
      group_by(Species) %>% 
      summarise_if(is.numeric,funs(N=n(),mean,sd, se=sd(.)/sqrt(n()))) %>% 
      ungroup()

I tried this as I though I was getting the same error because of another package masking n(), but apparently I am doing something else wrong as I still get the error:Error in n(): ! Must only be used inside data-masking verbs like mutate(), filter(), and group_by().

iris_means<-iris %>% 
  group_by(Species) %>% 
  dplyr::summarise_if(is.numeric,list(N=n(),mean,sd, se=sd(.)/sqrt(n()))) %>% 
  ungroup()

How can I update this code to make it work correctly and give the same column names as before funs() is totally gone?

stefan
  • 90,330
  • 6
  • 25
  • 51
user3490557
  • 744
  • 2
  • 6
  • 9

1 Answers1

4

Using across and where you could rewrite your code like so:

library(dplyr)

iris %>%
  group_by(Species) %>%
  summarise(across(
    where(is.numeric),
    list(N = ~ n(), mean = mean, sd = sd, se = ~ sd(.) / sqrt(n()))
  )) %>%
  ungroup()
#> # A tibble: 3 × 17
#>   Species    Sepal.Len…¹ Sepal…² Sepal…³ Sepal…⁴ Sepal…⁵ Sepal…⁶ Sepal…⁷ Sepal…⁸
#>   <fct>            <int>   <dbl>   <dbl>   <dbl>   <int>   <dbl>   <dbl>   <dbl>
#> 1 setosa              50    5.01   0.352  0.0498      50    3.43   0.379  0.0536
#> 2 versicolor          50    5.94   0.516  0.0730      50    2.77   0.314  0.0444
#> 3 virginica           50    6.59   0.636  0.0899      50    2.97   0.322  0.0456
#> # … with 8 more variables: Petal.Length_N <int>, Petal.Length_mean <dbl>,
#> #   Petal.Length_sd <dbl>, Petal.Length_se <dbl>, Petal.Width_N <int>,
#> #   Petal.Width_mean <dbl>, Petal.Width_sd <dbl>, Petal.Width_se <dbl>, and
#> #   abbreviated variable names ¹​Sepal.Length_N, ²​Sepal.Length_mean,
#> #   ³​Sepal.Length_sd, ⁴​Sepal.Length_se, ⁵​Sepal.Width_N, ⁶​Sepal.Width_mean,
#> #   ⁷​Sepal.Width_sd, ⁸​Sepal.Width_se

And using dplyr >= 1.1.0 we could get rid of group_by + ungroup by using the .by argument like so (Thx to @Edo for the suggestion):

iris %>%
  summarise(
    across(
      where(is.numeric),
      list(N = ~ n(), mean = mean, sd = sd, se = ~ sd(.) / sqrt(n()))
    ),
    .by = Species
  )
stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    upvoted! `ungroup` seems unnecessary. If there is more than one grouping variable, `.groups = "drop"` inside `summarise` seems to be the new standard over `ungroup`. – Edo Mar 21 '23 at 16:15
  • 1
    @Edo. Yeah, of course. And with dplyr 1.1.0 I would personally go for `.by = Species` to get rid of the `group_by` too. :D – stefan Mar 21 '23 at 16:37
  • 2
    that's cool! I haven't look into it yet, so thanks for sharing! I think you should update your answer with the most updated code then! I guess it would be more useful for OP too since he's looking to update his code :-) – Edo Mar 21 '23 at 17:38
  • 1
    This is great, thanks all for the helpful updates! – user3490557 Mar 21 '23 at 19:12