7

I have a data frame of factors called questions

q1 q2 q3
A  A  B
C  A  A
A  B  C

That I want to reshape into

question answer freq
1        A      2
1        B      0
1        C      1
2        A      2
2        B      1
2        C      0
3        A      1
3        B      1
3        C      1

I feel like there should be a way to this with reshape2 or plyr, but I couldn't figure it out.

Instead, I did the following:

tbl <- data.frame()
for(i in 1:dim(questions)[2]){
    subtable <- cbind(question = rep(i, 3),
                      as.data.frame(table(questions[i])))
    tbl <- rbind(tbl, subtable)
}

Is there a cleaner approach to reshaping this table?

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
user2689931
  • 383
  • 1
  • 6
  • 15

3 Answers3

5

Here's a base R approach that is similar in concept to what @akrun has posted. I haven't bothered with the cleaning up since that's mostly cosmetic and isn't related to the concept of the question.

The general approach would be:

data.frame(table(stack(mydf))

However, stack won't work with factors, so if your data are factors and not characters, you will have to convert using as.character first, like this:

data.frame(table(stack(lapply(mydf, as.character))))
#   values ind Freq
# 1      A  q1    2
# 2      B  q1    0
# 3      C  q1    1
# 4      A  q2    2
# 5      B  q2    1
# 6      C  q2    0
# 7      A  q3    1
# 8      B  q3    1
# 9      C  q3    1

Moving away from "plyr" and "reshape2" and instead towards "dplyr" and "tidyr", you can try:

library(dplyr)
library(tidyr)

mydf %>% 
  gather(question, answer, everything()) %>%  ## Get the data into a long form
  group_by(question, answer) %>%              ## Group by both question and answer columns
  summarise(freq = n()) %>%                   ## Calculate the relevant frequency
  right_join(expand(., question, answer))     ## Merge with all combinations of Qs and As
# Joining by: c("question", "answer")
# Source: local data frame [9 x 3]
# Groups: question
# 
#   question answer freq
# 1       q1      A    2
# 2       q1      B   NA
# 3       q1      C    1
# 4       q2      A    2
# 5       q2      B    1
# 6       q2      C   NA
# 7       q3      A    1
# 8       q3      B    1
# 9       q3      C    1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • another one-liner: `as.data.frame(table(unlist(dat), rep(names(dat), nrow(dat)), dnn = list('answers','questions')))` – rawr Apr 17 '15 at 16:12
3

Try

library(qdapTools)
library(reshape2)
colnames(questions) <- sub('\\D+', '', colnames(questions))
setNames(melt(as.matrix(mtabulate(questions))), 
                      c('question', 'answer', 'freq'))

Or using data.table

library(data.table)#v.1.9.5+
setkey(
    setnames(
      melt(setDT(questions, keep.rownames=TRUE), id.var='rn',
             value.name='answer')[, list(freq=.N),
                  by=list(variable, answer)],
           'variable', 'question'), 
                  question, answer)[
       CJ(question=unique(question), answer=unique(answer))][
                 is.na(freq), freq:=0][]
 #   question answer freq
 #1:        1      A    2
 #2:        1      B    0
 #3:        1      C    1
 #4:        2      A    2
 #5:        2      B    1
 #6:        2      C    0
 #7:        3      A    1
 #8:        3      B    1
 #9:        3      C    1
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Yeah, it's a little tricky because of the zeros. After melting, instead of casting directly into the form you need, cast into wide form and then melt again. There's probably a just as easy was using base R and table though.

d <- read.table(text="q1 q2 q3
                       A  A  B
                       C  A  A
                       A  B  C", header=TRUE, as.is=TRUE)
melt(dcast(melt(d, measure.vars=1:3), value ~ variable))

## Aggregation function missing: defaulting to length
## Using value as id variables
##   value variable value
## 1     A       q1     2
## 2     B       q1     0
## 3     C       q1     1
## 4     A       q2     2
## 5     B       q2     1
## 6     C       q2     0
## 7     A       q3     1
## 8     B       q3     1
## 9     C       q3     1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142