1

Very similar question to this one, however there are some fundamental differences.

I have a data set of a Timestamp, 4 measurement columns, and 4 state columns:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor")), .Names = c("Timestamp", "A", "B", 
"C", "D", "SA", "SB", "SC", "SD"), row.names = c(NA, 6L), class = "data.frame")

I want to calculate the median of the columns that are on, as indicated by a 1 in the S* columns.

So far I can find which measurement columns to use row by row using this:

foo[i, c(which(x = foo[i, 6:9] == 1, arr.ind = FALSE) + 1)]

where i is the row number.

And this is as far as I get without my code getting too complicated. I was thinking I could create a new data frame by binding the columns I got with the line of code above (after a row by row for loop) to the Timestamp, fill in empty spots with NAs, calculate the median of that data frame, and finally bind the median to the original data frame. But there has to be a better way!

Any ideas?

Edit:

Output should look like this:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), Median = c(18.51558, 18.756115, 
19.262825, 18.757795, 18.89796, 19.056395)), .Names = c("Timestamp", 
"A", "B", "C", "D", "SA", "SB", "SC", "SD", "Median"), row.names = c(NA, 
6L), class = "data.frame")
Community
  • 1
  • 1
americo
  • 1,013
  • 8
  • 17

1 Answers1

1

This is a little messy because your S* columns are factor. If you converted them to numeric or logical you could skip much of the second line of code below:

w <- grepl("^S", names(foo))
m <- matrix(as.logical(as.numeric(as.matrix(foo[, w]))), ncol = sum(w))
foo$Median <- apply(`[<-`(as.matrix(foo[,LETTERS[1:4]]), !m, NA), 1, median, na.rm=TRUE)
foo
#             Timestamp A        B        C D SA SB SC SD   Median
# 1 2014-09-01 06:00:02 0 20.77579 16.25537 0  0  1  1  0 18.51558
# 2 2014-09-01 06:00:06 0 21.05727 16.45496 0  0  1  1  0 18.75612
# 3 2014-09-01 06:00:10 0 21.81632 16.70933 0  0  1  1  0 19.26282
# 4 2014-09-01 06:00:14 0 21.36299 16.15260 0  0  1  1  0 18.75780
# 5 2014-09-01 06:00:18 0 21.18629 16.60963 0  0  1  1  0 18.89796
# 6 2014-09-01 06:00:22 0 21.34721 16.76558 0  0  1  1  0 19.05640
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Awesome! Can you explain or point me towards a resource explaining ``[<-``? I've never seen that before. – americo Sep 16 '14 at 15:09
  • 1
    @amzu `[<-` is assignment to a subset of the lefthand side (it's the same as something like `a[b] <- c` but instead written as `[<-(a,b,c)` and does not change the original `a` object. I use that construction in order to reduce two lines of code to one. – Thomas Sep 16 '14 at 15:14