3

I am familiar with some of the split-apply-combine functions in R, like ddply, but I am unsure how to split a data frame, modify a single variable within each subset, and then recombine the subsets. I can do this manually, but there is surely a better way.

In my example, I am trying to shuffle a single variable (but none of the others) within a group. This is for a permutation analysis, so I am doing it many many times, and would thus like to speed things up.

allS <- split(all, f=all$cp)
for(j in 1:length(allS)){
    allS[[j]]$party <- sample(x=allS[[j]]$party)
}
tmpAll <- rbind.fill(allS)

Sample data frame:

all <- data.frame(cp=factor(1:5), party=rep(c("A","B","C","D"), 5))

Thanks for any direction!

Michael Davidson
  • 1,391
  • 1
  • 14
  • 31

2 Answers2

4

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(all)), grouped by 'cp', sample the 'party' and assign (:=) that output back to the 'party' column.

library(data.table)
setDT(all)[, party:= sample(party) , by = cp]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

The dplyr way.

library(dplyr)
all %>% group_by(cp) %>% mutate(party=sample(party))
Ven Yao
  • 3,680
  • 2
  • 27
  • 42