0

I found a function that provides frequencies with condition and I thought of creating a function

do.call(data.frame, aggregate(X1 ~ X2, data=dat, FUN=table))

I also managed to get the column names by their index number from this thread using name <- names(dataset)[index].


I want to get the frequency of Xn ~ Xstatic, where Xn are the n-1 variables and Xstatic is the variable of interest.

So far I made a for loop and here is my code:

library(prodlim)

NUM <- 100
dat1 <- SimSurv(NUM)
dat1$time <- sample(24:160,NUM,rep=TRUE)
dat1$X3 <- sample(0:1,NUM,rep=TRUE)
dat1$X4 <- sample(0:9,NUM,rep=TRUE)
dat1$X5 <- sample(c("a","b","c"),NUM,rep=TRUE)
dat1$X6 <- sample(c("was","que","koa","sim","sol"),NUM,rep=TRUE)
dat1$X7 <- sample(1:99,NUM,rep=TRUE)
dat1$X8 <- sample(1:200,NUM,rep=TRUE)
attach(dat1)

# EXAMPLE
# do.call(data.frame, aggregate(status ~ X6, data=dat1, FUN=table))

for( i in 1:ncol(dat1) ) {
  name <- names(dat1)[i]
  do.call(data.frame, aggregate(name ~ X6, data=dat1, FUN=table))  
}

I get the error below and I am at a loss on how to solve this. All help is appreciated.

 Error in model.frame.default(formula = name ~ X6, data = dat1) : 
   variable lengths differ (found for 'X6') 
Community
  • 1
  • 1

1 Answers1

1

1) I would suggest not using attach;

2) it is meaningless to make a frequency table of your variable of interest to some of these other variables, the continuous ones, for instance, or the ones from which you have sampled from 99 and 200 possible values;

3) why would you want to combine your results into a data frame? just print them or save to a list:

mylist <- list()
for ( i in c('status','X2','X3','X4','X5','X7','X8') ) {
  mylist[i] <- list(table(dat1[ ,i], dat1$X6))
}
rawr
  • 20,481
  • 4
  • 44
  • 78
  • Thank you! I'm actually inserting them in a data frame because I need them in a CSV file... –  Jan 13 '14 at 03:39
  • Okay.. `tmp <- do.call(rbind, mylist)`; `write.csv(tmp, file = 'tmp.csv')` – rawr Jan 13 '14 at 03:48
  • 1
    Thanks, it works great! Is it possible to add which variable the values where compared against (y-axis)? I can get the order from the logs but it could cause confusion later on. –  Jan 13 '14 at 03:52
  • 1
    That's one of the reasons I wouldn't combine them into a data frame. This is very hacky, but `mylist <- NULL for ( i in c('status','X2','X3','X4','X5','X7','X8') ) { mylist <- rbind(mylist, rbind(i,table(dat1[ , i], dat1$X6))) } write.csv(mylist, file = 'tmp.csv', quote = F)` – rawr Jan 13 '14 at 04:06
  • Thank you! Oh if that's the case what do you recommend? –  Jan 13 '14 at 04:18