1

I have the following data frame in R:

objects   categories
   A       162
   B       162
   B       190
   C       123
   C       162
   C       185
   C       190
   C        82
   C       191
   D       185

As you see there are objects and the categories they belong to. I would like to sum up the categories of each object in comma separated list to get a data frame which would look like this:

 objects   categories
   A       162
   B       162, 190
   C       123, 162, 185, 190, 82, 191
   D       185

How could I do this?

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
sabsirro
  • 105
  • 1
  • 1
  • 5

5 Answers5

4
aggregate(categories~objects,data=x,FUN=paste)
  objects                  categories
1       A                         162
2       B                    162, 190
3       C 123, 162, 185, 190, 82, 191
4       D                         185
James
  • 65,548
  • 14
  • 155
  • 193
  • Just beat me to it. I had posted `aggregate(list(categories=df$categories), by=list(objects=df$objects), c)` – A5C1D2H2I1M1N2O1R2T1 Jul 24 '12 at 16:26
  • @mrdwab I was just about to upvote it. I'd suggest undeleting it as using `c` may be preferable to `paste` if the OP wants to calculate with the numbers later. – James Jul 24 '12 at 16:31
  • true. I generally prefer `c` for flexibility further down the line. I had just felt the two responses were *so* much the same. Undeleted--we'll see what others think! :-) – A5C1D2H2I1M1N2O1R2T1 Jul 24 '12 at 16:35
3

This can be done with any of the aggregation tools of your choice, I'll show an example using plyr package and paste() function. This assumes your data is named x:

library(plyr)
ddply(x, .(objects), summarize, categories = paste(categories, collapse = ","))
#-----
  objects             categories
1       A                    162
2       B                162,190
3       C 123,162,185,190,82,191
4       D                    185
Chase
  • 67,710
  • 18
  • 144
  • 161
  • 2
    The next logical question will be "which aggregation tool should I use?" I gave a pretty good effort answering that one [here](http://stackoverflow.com/questions/10748253/idiomatic-r-code-for-partitioning-a-vector-by-an-index-and-performing-an-operati/10748470#10748470) should you be interested. – Chase Jul 24 '12 at 16:28
2

As the title of your question implies, use aggregate:

aggregate(list(categories=df$categories), by=list(objects=df$objects), c)
#   objects                  categories
# 1       A                         162
# 2       B                    162, 190
# 3       C 123, 162, 185, 190, 82, 191
# 4       D                         185
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

aggregate If DF is your data frame then try this:

aggregate(categories ~ objects, DF, function(x) toString(unique(x)))

sqldf With sqldf this works:

library(sqldf)
sqldf("select objects, group_concat(distinct categories) as categories
  from DF group by objects")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

A data.table solution

library(data.table)
DT <- as.data.table(DF)
DT[,list(categories = list(categories)), by = objects]

##    objects             categories
## 1:       A                    162
## 2:       B                162,190
## 3:       C 123,162,185,190,82,191
## 4:       D                    185
mnel
  • 113,303
  • 27
  • 265
  • 254