-2

For a range of columns (variables) in a data frame, I need to generate tables for each level of each variable. I have created a function that finds the first column in the range (Q1a) and then runs a for loop for every column after. The loop finds the number of levels in each column and then a second for loop is supposed to generate the table and save it as a .csv with a unique file name.

Example data frame:

df <- data.frame(Organization = c("A", "B", "C", "D"), Gender = c("Male", "Female", "Male", "Female"), Q1a = c(1, 2, 3, 4), Q1b = c(1, 2, 1, 2))

df Returns:

  Organization Gender Q1a Q1b
1            A   Male   1   1
2            B Female   2   2
3            C   Male   3   1
4            D Female   4   2

Here is the function I wrote:

library(dplyr)
library(reshape2)
f = function(df) {
  a <- which(colnames(df) == "Q1a")
  for(i in colnames(df[, a:length(df)])) {
      levelsList <- as.list(levels(i))
      for(j in levelsList) {
          temp <- subset(df, i == j, select = c(Organization, Gender, i))
          temp <- group_by(temp, Organization, Gender)
          table <- summarize(temp, count = n())
          table <- dcast(table, Organization ~ Gender)
          table <- mutate(table, Total = Female + Male)
          write.csv(table, paste0(i, "_", j, ".csv"))
      }
   }
}

The expected output for a larger dataset would look like:

    Organization    Female  Male    Total
1          A          366    259      625
2          B          30     35       65
3          C          83     61       144
4          D          55     58       113

I can generate a single table like the one above correctly with a few lines of code. The problem is that the function does not generate any files. Am I on the right track? Is there an easier way to do this using apply?

Nick Holt
  • 116
  • 6
  • please include your expected output. It is not clear from your post. – lmo Jul 28 '16 at 18:42
  • 1
    You are missing the closing parenthesis on `write.csv`. – aosmith Jul 28 '16 at 18:51
  • This isn't, IMO, an appropriate question for SO. Basic issues, like making sure parentheses are all closed, should be fixed before posting a question. – dayne Jul 28 '16 at 18:54
  • You are correct. My apologies. However, making that edit does not make the script functional. I have appended my submission appropriately. Thanks for the feedback. – Nick Holt Jul 28 '16 at 19:06

1 Answers1

0

A few issues that I changed. Take a look at what the value of i is in your loop. It's the string value of the column name from df. When passed to levels() all you will ever get is NULL. If you really want to use levels() you need to convert those columns to factors, or else substitute unique().

See the warning section under ?subset for using this function, I've changed the code to use [ for subsetting the data frame.

Also, I assumed that you meant to output table to the CSV instead of org.sum which does not exist in your example.

library(dplyr)
library(reshape2)

df <- data.frame(Organization = c("A", "A", "B", "C", "D"), Gender = c("Male", "Female", "Female", "Male", "Female"), Q1a = as.factor(c(1, 1, 2, 3, 4)), Q1b = as.factor(c(1, 1, 2, 1, 2)))
f = function(df) {
  a <- which(colnames(df) == "Q1a")
  for(i in colnames(df[, a:length(df)])) {
    levelsList <- levels(df[,i])
    for(j in levelsList) {
      temp <- df[df[,i] == j, c('Organization', 'Gender', i)] 
      temp <- group_by(temp, Organization, Gender)
      table <- summarize(temp, count = n())
      table <- dcast(table, Organization ~ Gender, value.var='count')
      if (dim(table)[2] > 2) {table$Total <- rowSums(table[,c('Male', 'Female')])}
      else {table$Total <- table[,2]}
      write.csv(table, paste0(i, "_", j, ".csv"), row.names = F)
    }
  }
}

f(df)
Jacob F
  • 366
  • 1
  • 6
  • Using unique() instead of levels() in your solution worked beautifully. Your subsetting tweak was definitely the missing piece. And yes, org.sum is a typo. Thank you for the helpful response! Cheers. – Nick Holt Jul 28 '16 at 20:49