HOW can i sum a categorical variable and aggregate by factor

Question

So let me be a little more specific..... i have a dataset that has

SOCCERTEAM -PLAYERS
BARCA - MESSI
BARCA - MESSI
BARCA - MESSI
BARCA - XAVI
-RM - CR
-RM - CR
-RM - PEPE
-RM -HIQUAIN etc(just an example not dataset)

as columns!!!

I want the answer to this question : " How can i find the top 5 teams according to how many players they used" *teams can use players more than once so finding the factor levels are not a possibility *so if barca used 15 players and Rm used 14 then BARCA is first.....

Try `library(data.table);head(setDT(df1)[, .(n = uniqueN(PLAYERS)), SOCCERTEAM][order(-n)]$SOCCERTEAM, 5)` — akrun, May 21 '17 at 16:54
@akrun thnx for the help...it worked even though i can't really find the use of the part: .....[, .(n = uniqueN(PLAYERS)), SOCCERTEAM][order(-n)]$SOCCERTEAM, 5)... why after setDT(df1) we use [ ] ? — Fallen Greg, May 21 '17 at 17:21
You should probably take a look at [Getting Started with `data.table`](https://github.com/Rdatatable/data.table/wiki/Getting-started). — Gregor Thomas, May 21 '17 at 17:37

yeedle · Accepted Answer · 2017-05-21T17:38:14.673

0

library(dplyr)

df %>% 
  group_by(SOCCERTEAM) %>% 
  summarize(rank = n_distinct(PLAYERS)) %>%
  top_n(5, wt = rank)

edited May 21 '17 at 17:38

answered May 21 '17 at 17:32

yeedle

4,918
1
22
22

Error in mutate_impl(.data, dots) : invalid subscript type 'list' – Fallen Greg May 21 '17 at 17:41
Maybe try renaming 'rank' to something else. So `summarize(n_players = n_distinct(PLAYERS)) %>% top_n(5, n_players)` – yeedle May 21 '17 at 17:44

HOW can i sum a categorical variable and aggregate by factor

1 Answers1