I'm using dplyr (and may need tidyr?), and have a list of "samples" and which "group" they belong to. Some samples can belong to multiple groups. I'd like to group_by
and summarize
this data such that I get a delimited character vector showing which groups a sample belongs to. I realize this isn't "tidy" data, but at this point it's at the final report stage, no more processing needed.
Here's what I have:
data_frame(sample=c("A", "B", "C", "A", "B", "D", "E"),
group=c("g1", "g1", "g2", "g2", "g3", "g3", "g3"))
Source: local data frame [7 x 2]
sample group
1 A g1
2 B g1
3 C g2
4 A g2
5 B g3
6 D g3
7 E g3
Here's what I want:
data_frame(sample=c("A", "B", "C", "D", "E"),
groups=c("g1; g2", "g1; g3", "g2", "g3", "g3"))
Source: local data frame [5 x 2]
sample groups
1 A g1; g2
2 B g1; g3
3 C g2
4 D g3
5 E g3
The delimiter doesn't have to be semicolon, but it doesn't cause CSV issues. Solution should allow me to choose what I want there.