R: Counting appearances of strings for each column in data frame

Question

I have a data drame with results for a survey with 1 column for each question. There are only 5 different satisfaction levels(VDSAT,DSAT,NTL,SAT,VSAT), so each column are a bunch of satisfaction levels.

I would like to have a summary data.frame that counts, for each column, how many different satisfaction levels did I get (put 0 if no occurrencies). Each column would be a question, each row a satisfaction level and the intersections the counts.

I have tried with apply(df,2,table) which will give me a list of tables and then put into data.frame format with data.frame(matrix(unlist(table),nrow=5)).

This approach works well if I have at least 1 result of each satisfaction level for every column. If one column, let's say, does not have "DSAT", then the resulting data.frame is not correct as the missing value is not recognized by the table command from the apply.

Basically the output would be like the following:

      Q1 Q2 Q3 Q4 Q5       Satisfaction
      12 16 22 24 23  Very dissatisfied
      27 30 33 24 33       Dissatisfied
      49 36 33 30 32            Neutral
       6 11 17 25 22          Satisfied
      22 23 11 13  6     Very satisfied

Thanks very much

Kind regards,

EDIT: Sample of raw data:

Q1              Q2              Q3                  Q4              Q5  
Very satisfied  Very satisfied  Very satisfied      Very satisfied  Very satisfied  
Satisfied       Dissatisfied    Very dissatisfied   Dissatisfied    Satisfied   
Very satisfied  Very satisfied  Very satisfied      Very satisfied  Very satisfied
Satisfied       Satisfied       Satisfied           Satisfied       Satisfied   
Very satisfied  Very satisfied  Very satisfied      Very satisfied  Very satisfied  
...             ...             ...                 ...             ...

Sample input would help reproduce the problem and give a working solution. — Tensibai, Aug 27 '15 at 12:39
can you show the `head(dput())` of the data instead? It would be much easier to work with that — erasmortg, Aug 27 '15 at 12:53

talat · Accepted Answer · 2015-08-27T12:59:46.210

2

Here's an approach using dplyr and tidyr. The idea is to reshape your data from wide to long format, count occurences per answer and question and the spread the data back to a wide format.

library(tidyr)
library(dplyr)

gather(dat, Question, Answer) %>% 
  count(Question, Answer) %>% 
  spread(Question, n, fill = 0L)
#Source: local data frame [5 x 6]
#
#  Answer Q1 Q2 Q3 Q4 Q5
#1   DSAT  1  0  1  3  0
#2    NTL  0  0  0  1  2
#3    SAT  0  1  0  0  1
#4  VDSAT  1  3  2  0  1
#5   VSAT  2  0  1  0  0

Sample data I used:

set.seed(12)
dat <- as.data.frame(matrix(sample(c("VDSAT","DSAT","NTL","SAT","VSAT"), 20, TRUE), ncol = 5))
dat[] <- lapply(dat, factor, levels = c("VDSAT","DSAT","NTL","SAT","VSAT"))
names(dat) <- paste0("Q", 1:5)

edited Aug 27 '15 at 12:59

answered Aug 27 '15 at 12:54

talat

68,970
21
126
157

4

Or `dat$indx <- 1 ; reshape2::recast(dat, value ~ variable, id.var = "indx")` – David Arenburg Aug 27 '15 at 13:00
1

would you mind posting your answer also on [this question](http://stackoverflow.com/questions/26291674/create-frequency-tables-for-multiple-factor-columns-in-r) ? The question has a higher score so will be easier to find for users but has "only" `sapply` type answers so owuld benefit from your answer imo – Cath Aug 27 '15 at 13:18

R: Counting appearances of strings for each column in data frame

1 Answers1