1

I'm using dplyr (and may need tidyr?), and have a list of "samples" and which "group" they belong to. Some samples can belong to multiple groups. I'd like to group_by and summarize this data such that I get a delimited character vector showing which groups a sample belongs to. I realize this isn't "tidy" data, but at this point it's at the final report stage, no more processing needed.

Here's what I have:

data_frame(sample=c("A", "B", "C", "A", "B", "D", "E"),
            group=c("g1", "g1", "g2", "g2", "g3", "g3", "g3"))
Source: local data frame [7 x 2]

  sample group
1      A    g1
2      B    g1
3      C    g2
4      A    g2
5      B    g3
6      D    g3
7      E    g3

Here's what I want:

data_frame(sample=c("A", "B", "C", "D", "E"),
            groups=c("g1; g2", "g1; g3", "g2", "g3", "g3"))
Source: local data frame [5 x 2]

  sample groups
1      A g1; g2
2      B g1; g3
3      C     g2
4      D     g3
5      E     g3

The delimiter doesn't have to be semicolon, but it doesn't cause CSV issues. Solution should allow me to choose what I want there.

Frank
  • 66,179
  • 8
  • 96
  • 180
Stephen Turner
  • 2,574
  • 8
  • 31
  • 44

0 Answers0