0

My data looks a bit like this:

Q1  Q2  Q3  Q4
S   NS  S   S
NS  .   S   PMI
S   S   TMI S
PMI S   S   NS

The only options are S, TMI, PMI, NS (or missing, which is a .). I would like it to look like this (df called PCP):

    Q1  Q2  Q2  Q4
S   2   2   3   2
NS  1   1   0   1
PMI 1   0   0   1
TMI 0   0   1   0
.   0   1   0   0

I can do it in 2 steps with this code:

Counts <-  melt(table(PCP$Q1)) %>%
  join(y = melt(table(PCP$Q2)),  type = "full") %>%
  join(y = melt(table(PCP$Q3)),  type = "full") %>%
  join(y = melt(table(PCP$Q4)),  type = "full")
Counts <- melt(Counts, key='Var1')

Bur is there a nice way of doing this in one step maybe using dplyr/plyr packages?

morgan121
  • 2,213
  • 1
  • 15
  • 33
  • 3
    Not `dplyr` but `table(stack(PCP))` – thelatemail Oct 18 '18 at 05:21
  • Thanks for the response! Unfortunately stack just gives the error: "Error in stack.data.frame(PCP) : no vector columns were selected" – morgan121 Oct 18 '18 at 05:30
  • That's because you have `factor` columns instead of `character` columns in your data.frame (and rightfully so - stacking/gathering/melting data from different categorical variables with potentially different labels doesn't make sense). It wasn't clear from your question but I assure you it works if you read your data in with `stringsAsFactors=FALSE` – thelatemail Oct 18 '18 at 05:31
  • @thelatemail should be added as answer IMO. – Ronak Shah Oct 18 '18 at 05:34
  • right you are @thelatemail!! What a legend, works very well now :) – morgan121 Oct 18 '18 at 05:38

2 Answers2

0

How about this:

library(tidyverse)

PCP %>% 
  gather(question, answer) %>%
  mutate(answer = replace(answer, answer==".", "Missing")) %>% 
  count(question, answer) %>%
  spread(question, n, fill=0)
  answer     Q1    Q2    Q3    Q4
1 Missing     0     1     0     0
2 NS          1     1     0     1
3 PMI         1     0     0     1
4 S           2     2     3     2
5 TMI         0     0     1     0
eipi10
  • 91,525
  • 24
  • 209
  • 285
0

Using tidyverse and reshape2:

df %>%
  gather(var, val) %>%
  group_by(var) %>%
  add_count(val) %>%
  dcast(val ~ var, value.var = "n")

  val Q1 Q2 Q3 Q4
1   .  0  1  0  0
2  NS  1  1  0  1
3 PMI  1  0  0  1
4   S  2  2  3  2
5 TMI  0  0  1  0
tmfmnk
  • 38,881
  • 4
  • 47
  • 67