Fastest Way to Split Data Frame by Group, shuffle single vector in R

Question

I am familiar with some of the split-apply-combine functions in R, like ddply, but I am unsure how to split a data frame, modify a single variable within each subset, and then recombine the subsets. I can do this manually, but there is surely a better way.

In my example, I am trying to shuffle a single variable (but none of the others) within a group. This is for a permutation analysis, so I am doing it many many times, and would thus like to speed things up.

allS <- split(all, f=all$cp)
for(j in 1:length(allS)){
    allS[[j]]$party <- sample(x=allS[[j]]$party)
}
tmpAll <- rbind.fill(allS)

Sample data frame:

all <- data.frame(cp=factor(1:5), party=rep(c("A","B","C","D"), 5))

Thanks for any direction!

score 4 · Accepted Answer · answered Dec 10 '15 at 18:53

4

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(all)), grouped by 'cp', sample the 'party' and assign (:=) that output back to the 'party' column.

library(data.table)
setDT(all)[, party:= sample(party) , by = cp]

answered Dec 10 '15 at 18:53

akrun

874,273
37
540
662

score 2 · Answer 2 · answered Dec 11 '15 at 06:52

2

The dplyr way.

library(dplyr)
all %>% group_by(cp) %>% mutate(party=sample(party))

answered Dec 11 '15 at 06:52

Ven Yao

3,680
2
27
42

Fastest Way to Split Data Frame by Group, shuffle single vector in R

2 Answers2