0

I'm trying to understand how to aggregate my output. I've created some dummy data which approximates my actual data, which is: hundreds of group1, 3 levels of group2, and several dozen validation logicals. Apologies if this seems simple, I've hunted and pecked alot, and have to say that as a newbie to R, the huge variety of tools (the apply family, ddply, aggregate, table, reshape, etc) out there is both wonderful and a bit scary:)

 #create data
group1 <- paste("Group", rep(LETTERS[1:7], sep=''))
group2 <- c("UNC", "UNC", "SS", "LS", "LS", "SS", "UNC")
valid1 <- c("Y", "N", NA, "N", "Y", "Y", "N")
valid2 <- c("N", "N", "Y", "N", "N", "Y", "N")
valid3 <- c(1.4, 1.2, NA, 0.7, 0.3, NA, 1.7)
valid4 <- c(0.4, 0.3, 0.53, 0.66, 0.3, 0.3, 0.71)
valid5 <- c(8.5, 11.2,NA, NA, 8.3, NA, 11.7)

testdata <- data.frame(cbind(group, group2, valid1, valid2, valid3, valid4, valid5))

valid <- function(testdata){
  for(i in group)
    val1 <- ifelse(valid1=="Y", 1,0)
     val2 <- ifelse(valid2=="Y", 1,0)
      val3 <- ifelse(valid3>=1.0, 1,0)
      val4 <- ifelse(valid4<=0.5, 1,0)
       val5 <- ifelse(valid5>=10.0, 1,0)

  test.out <- data.frame(cbind(group1,group2, val1, val2, val3, val4, val5))

}
validtry <- valid(testdata)'

Then, I need to turn these logicals into numeric so they can be summed:

#make validations numeric
# why doesn't this work:
# validtry[,3:7] <- as.numeric(validtry[,3:7])
#but these do
validtry[,3] <- as.numeric(validtry[,3])
validtry[,4] <- as.numeric(validtry[,4])
validtry[,5] <- as.numeric(validtry[,5])
validtry[,6] <- as.numeric(validtry[,6])
validtry[,7] <- as.numeric(validtry[,7])
######

#summarize validtry
#sum on both groups
aggregate(validtry[,3:7], by=list(validtry$group1, validtry$group2), sum, na.rm=T)

#sum on one group
aggregate(validtry[,3:7], by=list(validtry$group2), sum, na.rm=T)

So, these last two get me close, but I think I need something different? I trying to sum across both rows and columns for the two groups. I'm familiar with tapply, but that doesn't seem to get it.

thanks in advance!!

  • 1
    You don't need `data.frame(cbind(..`, instead `testdata <- data.frame(group1, group2, valid1, valid2, valid3, valid4, valid5)` – akrun Sep 06 '14 at 16:38
  • 1
    You also don't need `as.numeric` to sum logical vectors. They have numeric values of zero for FALSE and 1 for TRUE. – Rich Scriven Sep 06 '14 at 16:39
  • 1
    Your need to fix the `group1` name. It's 'group1' when created and then just 'group' later. Andy if you don't say what the right answers are then the goal of "sum across both rows and columns for the two groups" is too vague to know how to implement correctly. – IRTFM Sep 06 '14 at 16:45
  • @Michael Slattery In your `group1`, there are seven unique `levels`. Your example dataset `nrow` is 7. So, average will be `7` values for each `valid` columns. It is not clear what you expect as a result. It would have been easier if you showed the expected output also. – akrun Sep 06 '14 at 17:48

1 Answers1

0

It is not clear about the expected output. My guess is:

 testdata <- data.frame(group1, group2, valid1, valid2, valid3, valid4, valid5)
 str1 <- c("valid1=='Y'", "valid2=='Y'", "valid3>=1.0", "valid4 <=0.5", "valid5>=10.0")
 validtry <- testdata

 #Though I used eval(parse(...)), it is not that recommended 
 validtry[,-(1:2)] <- lapply(str1, function(x) 1*with(testdata, eval(parse(text=x))))

 library(reshape2) 
 lst <-  lapply(validtry[3:7], function(x)
       dcast(data.frame(validtry[1:2], x), group1~group2, value.var="x", sum, na.rm=TRUE))

 lst[[1]]
 #   group1 LS SS UNC
 #1 Group A  0  0   1
 #2 Group B  0  0   0
 #3 Group C  0  0   0
 #4 Group D  0  0   0
 #5 Group E  1  0   0
 #6 Group F  0  1   0
 #7 Group G  0  0   0
akrun
  • 874,273
  • 37
  • 540
  • 662