I am trying to calculate the median (but that could be substituted by similar metrics) by group for multiple columns based on subsets defined by other columns. This is direct follow-on question from this previous post of mine. I have attempted to incorporate calculating the median via aggregate
into the Map(function(x,y) dosomething, x, y)
solution kindly provided by @Frank, but that didn't work. Let me illustrate:
Calculate median for A and B by groups GRP1 and GRP2
df <- data.frame(GRP1 = c("A","A","A","A","A","A","B","B","B","B","B","B"), GRP2 = c("A","A","A","B","B","B","A","A","A","B","B","B"), A = c(0,4,6,7,0,1,9,0,0,8,3,4), B = c(6,0,4,8,6,7,0,9,9,7,3,0))
med <- aggregate(.~GRP1+GRP2,df,FUN=median)
Simple. Now add columns defining which rows to be used for calculating the median, i.e. rows with NAs should be dropped, column a defines which rows to be used for calculating the median in column A, same for columns b and B:
a <- c(1,4,7,3,NA,3,7,NA,NA,4,8,1)
b <- c(5,NA,7,9,5,6,NA,8,1,7,2,9)
df1 <- cbind(df,a,b)
As mentioned above, I have tried combining Map
and aggregate
, but that didn't work. I assume that Map
doesn't know what to do with GRP1 and GRP2.
med1 <- Map(function(x,y) aggregate(.~GRP1+GRP2,df1[!is.na(y)],FUN=median), x=df1[,3:4], y=df1[, 5:6])
This is the result I'm looking for:
GRP1 GRP2 A B
1 A A 4 5
2 B A 9 9
3 A B 4 7
4 B B 4 3
Any help will be much appreciated!