Median of a subset of columns where value of other columns is 1

Question

Very similar question to this one, however there are some fundamental differences.

I have a data set of a Timestamp, 4 measurement columns, and 4 state columns:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor")), .Names = c("Timestamp", "A", "B", 
"C", "D", "SA", "SB", "SC", "SD"), row.names = c(NA, 6L), class = "data.frame")

I want to calculate the median of the columns that are on, as indicated by a 1 in the S* columns.

So far I can find which measurement columns to use row by row using this:

foo[i, c(which(x = foo[i, 6:9] == 1, arr.ind = FALSE) + 1)]

where i is the row number.

And this is as far as I get without my code getting too complicated. I was thinking I could create a new data frame by binding the columns I got with the line of code above (after a row by row for loop) to the Timestamp, fill in empty spots with NAs, calculate the median of that data frame, and finally bind the median to the original data frame. But there has to be a better way!

Any ideas?

Edit:

Output should look like this:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), Median = c(18.51558, 18.756115, 
19.262825, 18.757795, 18.89796, 19.056395)), .Names = c("Timestamp", 
"A", "B", "C", "D", "SA", "SB", "SC", "SD", "Median"), row.names = c(NA, 
6L), class = "data.frame")

What's your intended result for the example data? – Thomas Sep 16 '14 at 14:41 — Thomas, Sep 16 '14 at 14:41
@Thomas I've added the result data frame. – americo Sep 16 '14 at 14:49 — americo, Sep 16 '14 at 14:49
Sounds like a job just begging for the `plyr` package. – Carl Witthoft Sep 16 '14 at 14:54 — Carl Witthoft, Sep 16 '14 at 14:54

score 1 · Accepted Answer · answered Sep 16 '14 at 15:01

This is a little messy because your S* columns are factor. If you converted them to numeric or logical you could skip much of the second line of code below:

w <- grepl("^S", names(foo))
m <- matrix(as.logical(as.numeric(as.matrix(foo[, w]))), ncol = sum(w))
foo$Median <- apply(`[<-`(as.matrix(foo[,LETTERS[1:4]]), !m, NA), 1, median, na.rm=TRUE)
foo
#             Timestamp A        B        C D SA SB SC SD   Median
# 1 2014-09-01 06:00:02 0 20.77579 16.25537 0  0  1  1  0 18.51558
# 2 2014-09-01 06:00:06 0 21.05727 16.45496 0  0  1  1  0 18.75612
# 3 2014-09-01 06:00:10 0 21.81632 16.70933 0  0  1  1  0 19.26282
# 4 2014-09-01 06:00:14 0 21.36299 16.15260 0  0  1  1  0 18.75780
# 5 2014-09-01 06:00:18 0 21.18629 16.60963 0  0  1  1  0 18.89796
# 6 2014-09-01 06:00:22 0 21.34721 16.76558 0  0  1  1  0 19.05640

Awesome! Can you explain or point me towards a resource explaining ``[<-``? I've never seen that before. — americo, Sep 16 '14 at 15:09
@amzu `[<-` is assignment to a subset of the lefthand side (it's the same as something like `a[b] <- c` but instead written as `[<-(a,b,c)` and does not change the original `a` object. I use that construction in order to reduce two lines of code to one. — Thomas, Sep 16 '14 at 15:14

Median of a subset of columns where value of other columns is 1

1 Answers1