why is group_by and summarise not working?

Question

not sure what I'm doing wrong, I have the following dataset:

install.packages("random")              
library("random")


df <- data.frame(V1 = randomNumbers(n = 18,min = 1,max = 20, col=1),
                 factor_col = c(rep("A", 18)),
                 mouse_ID = c(1:18))

df2 <- data.frame(V1 = randomNumbers(n = 14,min = 1,max = 20, col=1),
                  factor_col = c(rep("B", 14)),
                  mouse_ID = c(1:14))

df3 <- data.frame(V1 = randomNumbers(n = 13,min = 1,max = 20, col=1),
                  factor_col = c(rep("C", 13)),
                  mouse_ID = c(6:18))

Table = bind_rows(df, df2)
Table = bind_rows(Table, df3)

Table$mouse_ID = as.factor(Table$mouse_ID)
Table$factor_col = as.factor(Table$factor_col)

Now I want to have a mean value of V1 for each mouse ID so I do:

Table2 = Table %>%
  group_by(mouse_ID) %>%
  summarise(mean_V1 = mean(V1, na.rm = TRUE))

but it only spits out one mean value of V1, not by mouse ID or anything. Is this because each variable of mouse ID doesn't re-cur exactly with the same frequency?

Thank you!

No, it shouldn't depend on the frequency of the id. Your code looks good and when I ran it, I got 18 means for the 18 unique ids. Is it possible you forgot the `group_by()` in your code? Perhaps restarting and rerunning might help? — brinyprawn, Jul 03 '23 at 19:36
My best guess is that this is a duplicate of https://stackoverflow.com/questions/26106146/why-does-summarize-or-mutate-not-work-with-group-by-when-i-load-plyr-after-dp/26106218#26106218 ... what do you get with `find("summarise")` ? — Ben Bolker, Jul 03 '23 at 19:41
yes it was because I needed to specify dplyr:: before summarise, thanks @BenBolker! — Chiara Toschi, Jul 11 '23 at 10:28

score 0 · Answer 1 · answered Jul 03 '23 at 19:31

Do you mean mutate instead of summarise?

> Table %>%
+     mutate(mean_V1 = mean(V1, na.rm = TRUE), .by = mouse_ID)
   V1 factor_col mouse_ID   mean_V1
1   2          A        1  5.500000
2   9          A        2 13.000000
3  12          A        3 14.000000
4  20          A        4 13.500000
5   2          A        5 11.000000
6  19          A        6 13.000000
7   9          A        7 11.333333
8   7          A        8  7.000000
9  19          A        9 14.000000
10 15          A       10  5.666667
11 11          A       11 14.333333
12 10          A       12 11.333333
13 15          A       13  8.666667
14  6          A       14  5.333333
15 19          A       15 18.500000
16  3          A       16  9.500000
17  1          A       17 10.500000
18 13          A       18 11.000000
19  9          B        1  5.500000
20 17          B        2 13.000000
21 16          B        3 14.000000
22  7          B        4 13.500000
23 20          B        5 11.000000
24  4          B        6 13.000000
25 19          B        7 11.333333
26  9          B        8  7.000000
27 17          B        9 14.000000
28  1          B       10  5.666667
29 15          B       11 14.333333
30 18          B       12 11.333333
31  5          B       13  8.666667
32  9          B       14  5.333333
33 16          C        6 13.000000
34  6          C        7 11.333333
35  5          C        8  7.000000
36  6          C        9 14.000000
37  1          C       10  5.666667
38 17          C       11 14.333333
39  6          C       12 11.333333
40  6          C       13  8.666667
41  1          C       14  5.333333
42 18          C       15 18.500000
43 16          C       16  9.500000
44 20          C       17 10.500000
45  9          C       18 11.000000

score 0 · Answer 2 · answered Jul 03 '23 at 19:37

Hmm, weird, as it's working as expected:

library("random")

df <- data.frame(V1 = randomNumbers(n = 18,min = 1,max = 20, col=1),
                 factor_col = c(rep("A", 18)),
                 mouse_ID = c(1:18))

df2 <- data.frame(V1 = randomNumbers(n = 14,min = 1,max = 20, col=1),
                  factor_col = c(rep("B", 14)),
                  mouse_ID = c(1:14))

df3 <- data.frame(V1 = randomNumbers(n = 13,min = 1,max = 20, col=1),
                  factor_col = c(rep("C", 13)),
                  mouse_ID = c(6:18))

Table = dplyr::bind_rows(df, df2)
Table = dplyr::bind_rows(Table, df3)

Table$mouse_ID = as.factor(Table$mouse_ID)
Table$factor_col = as.factor(Table$factor_col)

That's just for check:

Table |>
  subset(mouse_ID %in% c(1:3)) |>
  dplyr::arrange(mouse_ID, factor_col)
#>   V1 factor_col mouse_ID
#> 1 19          A        1
#> 2 15          B        1
#> 3  4          A        2
#> 4  2          B        2
#> 5  1          A        3
#> 6  6          B        3

And finally


Table |>
  dplyr::group_by(mouse_ID) |>
  dplyr::summarise(mean_V1 = mean(V1, na.rm = TRUE))
#> # A tibble: 18 × 2
#>    mouse_ID mean_V1
#>    <fct>      <dbl>
#>  1 1          17   
#>  2 2           3   
#>  3 3           3.5 
#>  4 4          12   
#>  5 5          13   
#>  6 6          11   
#>  7 7          15.7 
#>  8 8           6.67
#>  9 9          11   
#> 10 10          9.33
#> 11 11          5   
#> 12 12          3.33
#> 13 13         11.3 
#> 14 14          7.67
#> 15 15         19.5 
#> 16 16         12.5 
#> 17 17         15   
#> 18 18         13

^{Created on 2023-07-03 with reprex v2.0.2}

why is group_by and summarise not working?

2 Answers2