2

I am trying to do a simple table after have melt my data, but using dplyr.

My data look like this

   cluster   21:30   21:45
4        c   alone   alone
6        b       %       %
12       e partner partner
14       b partner partner
20       b   alone   alone
22       c partner partner

With table I can simply

table(dta$cluster)
   a b c d e 
   2 8 5 1 4 

How can I get the same results using melt and summarise?

 library(dplyr)
 library(reshape2)

 dta %>% 
 melt(id.vars = 'cluster')  %>% 
 group_by(cluster) %>% 
 summarise( n() ) 

What I need exactly is to table the clusters after having melt the data.

So to count correctly this data.frame

 dta %>% 
 melt(id.vars = 'cluster')

The expected output is this one

      cluster variable   value n_cluster
1        a    21:30       .         2
2        a    21:30 nuclear         2
3        a    21:45       .         2
4        a    21:45 nuclear         2
5        b    21:30       %         8
6        b    21:30 partner         8
7        b    21:30   alone         8
8        b    21:30 partner         8
9        b    21:30 partner         8
10       b    21:30 nuclear         8
11       b    21:30 partner         8
12       b    21:30 partner         8
13       b    21:45       %         8
14       b    21:45 partner         8
15       b    21:45   alone         8
16       b    21:45 partner         8
17       b    21:45 partner         8
18       b    21:45 nuclear         8
19       b    21:45 partner         8
20       b    21:45 partner         8
21       c    21:30   alone         5
22       c    21:30 partner         5
23       c    21:30       %         5
24       c    21:30 partner         5
25       c    21:30 partner         5
26       c    21:45   alone         5
27       c    21:45 partner         5
28       c    21:45       %         5
29       c    21:45 partner         5
30       c    21:45 partner         5
31       d    21:30 partner         1
32       d    21:45   alone         1
33       e    21:30 partner         4
34       e    21:30 nuclear         4
35       e    21:30 nuclear         4
36       e    21:30 nuclear         4
37       e    21:45 partner         4
38       e    21:45 nuclear         4
39       e    21:45 nuclear         4
40       e    21:45 nuclear         4

Any idea?

dta = structure(list(cluster = structure(c(3L, 2L, 5L, 2L, 2L, 3L, 
5L, 3L, 1L, 3L, 1L, 2L, 5L, 3L, 2L, 2L, 2L, 2L, 4L, 5L), .Label = c("a", 
"b", "c", "d", "e"), class = "factor"), `21:30` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 5L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor"), `21:45` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 2L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor")), .Names = c("cluster", 
"21:30", "21:45"), row.names = c("4", "6", "12", "14", "20", 
"22", "23", "28", "30", "32", "36", "38", "40", "42", "44", "48", 
"50", "56", "57", "60"), class = "data.frame")
giac
  • 4,261
  • 5
  • 30
  • 59
  • you don;t need melt, and you have your answer: `dta %>% group_by(cluster) %>% summarise( n() )` – jeremycg Aug 12 '15 at 20:43
  • 3
    I think just `count(dta, cluster)` if you already at it – David Arenburg Aug 12 '15 at 20:43
  • My first instinct when I see "my data looks like this" is to use `read.table` to put it into R... maybe it would be better to simply use the smaller data or to omit the arguably misleading display unless preceded by a clear `head(dta)` call. – Frank Aug 12 '15 at 20:43
  • ok I will do it in Excel later tonight but for now I need to do it like this :/ I know right – giac Aug 12 '15 at 20:43
  • 4
    I hope you aren't serious regarding Excel – David Arenburg Aug 12 '15 at 20:44
  • If I ask the problem I'm trying to solve with this you guys are going to kill me. So please any answer ? David please don't let me do it in excel – giac Aug 12 '15 at 21:00
  • 2
    @giacomoV `count(dta, cluster)` in the comment above is *the* answer to the question as posed... – Frank Aug 12 '15 at 21:10

2 Answers2

6

I can't seem to find a good dupe for this, but a simple dplyr idiom will be just using count

count(dta, cluster)
# Source: local data frame [5 x 2]
# 
#   cluster n
# 1       a 2
# 2       b 8
# 3       c 5
# 4       d 1
# 5       e 4

Per your new desired output, you could just left join this result to your melted data set

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  left_join(., count(dta, cluster)) %>%
  arrange(cluster)
#    cluster variable   value n
# 1        a    21:30       . 2
# 2        a    21:30 nuclear 2
# 3        a    21:45       . 2
# 4        a    21:45 nuclear 2
# 5        b    21:30       % 8
# 6        b    21:30 partner 8
# 7        b    21:30   alone 8
#...
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

When counting the distribution of a variable for repeated observations, one should take into account the number of observations.

In this example

n_episode = 2 

Then the code becomes straightforward

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  group_by(cluster) %>% 
  mutate( n_cluster = n() / n_episode) %>% 
  arrange(cluster)

One can use this result (n_episode) in order to calculate mean for groups of different sizes for example.

giac
  • 4,261
  • 5
  • 30
  • 59