1

This question is related to Sort R rows based on the number of repetition

From following data frame:

> ddf
   aa  bb
1   c efg
2   d cde
3   d abc
4   c abc
5   b efg
6   b cde
7   c abc
8   c abc
9   c cde
10  b cde
> 
> 
> dput(ddf)
structure(list(aa = structure(c(2L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, 
2L, 1L), .Label = c("b", "c", "d"), class = "factor"), bb = structure(c(3L, 
2L, 1L, 1L, 3L, 2L, 1L, 1L, 2L, 2L), .Label = c("abc", "cde", 
"efg"), class = "factor")), .Names = c("aa", "bb"), row.names = c(NA, 
-10L), class = "data.frame")

I can sort it:

> ddf[order(ddf$bb),]
   aa  bb
3   d abc
4   c abc
7   c abc
8   c abc
2   d cde
6   b cde
9   c cde
10  b cde
1   c efg
5   b efg

and I can tabulate like following:

> t(with(ddf, table(aa,bb)))
     aa
bb    b c d
  abc 0 3 1
  cde 2 1 1
  efg 1 1 0

But I want to have output like following:

abc  c c c d
cde  b b c d
eft  b c

I tried:

ll = list()
for(xx in unique(ddf$bb)) {
 ll[[length(ll)+1]] = xx
 ll[[length(ll)+1]] = ddf[ddf$bb==xx,]$aa
}

ll
[[1]]
[1] "efg"

[[2]]
[1] c b
Levels: b c d

[[3]]
[1] "cde"

[[4]]
[1] d b c b
Levels: b c d

[[5]]
[1] "abc"

[[6]]
[1] d c c c
Levels: b c d

But I cannot combine these to have output like:

abc  c c c d
cde  b b c d
eft  b c

The b,c,d etc should be sorted as shown above. Thanks for your help.

Edit: It works with the answer provided by @Richard Scriven:

> aggregate(aa ~ bb, ddf, function(x) paste(sort(x)))
   bb         aa
1 abc c, c, c, d
2 cde b, b, c, d
3 efg       b, c

But why following (which I had tried earlier) gives only numbers?

> aggregate(aa ~ bb, ddf, function(x) sort(x))
   bb         aa
1 abc 2, 2, 2, 3
2 cde 1, 1, 2, 3
3 efg       1, 2
Community
  • 1
  • 1
rnso
  • 23,686
  • 25
  • 112
  • 234

1 Answers1

2

You could use aggregate with an anonymous function to sort then paste the values.

aggregate(aa ~ bb, ddf, function(x) paste(sort(x), collapse = " "))
#    bb      aa
# 1 abc c c c d
# 2 cde b b c d
# 3 efg     b c
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245