0

If I have a simple data frame with 2 factors (a and b) with 2 levels (1 and 2) and 1 variable (x), how do I get the median values of x: median x over each level of factor a, each level of factor b, and each combination of a*b?

library(dplyr)    
df <- data.frame(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
   b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
   x = c(runif(16)))

I've tried various (many) versions of:

df %>%
   group_by_(c("a", "b")) %>%
   summarize(med_rate = median(df$x))

The results should look like this for the median x of each level of factor a:

a median
1 0.58811
2 0.53167

And like this for the median x of each level of factor b:

b median
1 0.60622
2 0.46096

And like this for the median x for each combinations of a and b:

a b median
1 1 0.66745
1 2 0.34656
2 1 0.50903
2 2 0.55990

Thanks in advance for any help.

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
David G
  • 1
  • 1
  • take the `df$` out of the `summarise` – Richard Telford May 25 '17 at 16:32
  • You don't need quotes and you can use `group_by` i.e. `df %>% group_by(a, b) %>% summarize(med_rate = median(x))` – akrun May 25 '17 at 16:39
  • Thanks. But this give me one median value; the median x over the 16 observation. It doesn't give me the median values of each level (1 and 2) of each factor (a & b) and each level of each a*b combination. – David G May 25 '17 at 17:06
  • @DavidG It does give me median for each level ie. 4 values. Perhaps you have loaded `plyr` library too. Try `df %>% group_by(a, b) %>% dplyr::summarize(med_rate = median9x))` – akrun May 25 '17 at 17:19
  • Yes! Thank you very much! – David G May 25 '17 at 17:31

2 Answers2

0
set.seed(123) ##make your example reproducible
require(data.table)
df <- data.table(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
             b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
             x = c(runif(16)))

df[, median(x), by = a]
df[, median(x), by = b]
df[, median(x), by = .(a,b)]
simone
  • 577
  • 1
  • 7
  • 15
0

The following is not very elegant but creates a single data.frame that meets your expected result.

We are creating three data data.frames (for a, b and a*b) and combining them into one.

bind_rows(
  df %>% 
    group_by(a) %>% 
    rename(factor_g = a) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    group_by(b) %>% 
    rename(factor = b) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    # We create a column for grouping a*b
    mutate(factor = paste(a, b)) %>% 
    group_by(factor) %>% 
    summarize(med_rate = median(x))
)
Juan Bosco
  • 1,420
  • 5
  • 19
  • 23