2

I am working on a school project and have a data set of 4,000 rows. There are 40 participants and each has about 100 rows. I want to create a data set that collapse the rows for each participant into summary statsitics, ideally the 90th percentile. I know how to find the mean values with dplyr:

Means <- bladder %>% 
  group_by(id, group) %>% 
  summarise(across(everything(), list(mean)))

And this works great. But is there somehow I could do the same thing but instead list the 90th percentiles instead of means?

Thank you!!

benson23
  • 16,369
  • 9
  • 19
  • 38
keherder
  • 35
  • 5

2 Answers2

2

The function to calculate percentile in R is quantile. We can specify probs = 0.9 to get 90th percentile.

Here I use the bladder dataset from the survival package to demonstrate.

library(dplyr)

survival::bladder %>% 
  group_by(id, rx) %>% 
  summarize(across(everything(), quantile, probs = 0.9, .groups = "drop"))

# A tibble: 85 × 7
      id    rx number  size  stop event  enum
   <int> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1      1     3     1   0     3.7
 2     2     1      2     1     4   0     3.7
 3     3     1      1     1     7   0     3.7
 4     4     1      5     1    10   0     3.7
 5     5     1      4     1    10   0.7   3.7
 6     6     1      1     1    14   0     3.7
 7     7     1      1     1    18   0     3.7
 8     8     1      1     3    18   0.7   3.7
 9     9     1      1     1    18   1     3.7
10    10     1      3     3    23   0     3.7
# … with 75 more rows
benson23
  • 16,369
  • 9
  • 19
  • 38
  • using this code, i am getting the following error "Error in `summarise()`: ! Problem while computing `..1 = across(everything(), quantile, probs = 0.9, .groups = "drops")`. ℹ The error occurred in group 1: ID = "ChIJ---TpZ6tEmsR8snxPmsJt0w". Caused by error in `across()`: ! Problem while computing column `column_label`. Caused by error in `(1 - h) * qs[i]`: ! non-numeric argument to binary operator Run `rlang::last_error()` to see where the error occurred." – Kumar Nov 16 '22 at 05:38
0

the following code also gives the solution

Percentile90 <- survival::bladder %>% 
                                  group_by(id, rx) %>% 
                                  summarise(across(everything(), 
                                  quantile, probs = 0.9, na.rm = T))
Kumar
  • 169
  • 1
  • 16