3

I am experiencing issues in trying to stratify permutations.

My data look like this :

     gender party        value
1      F    Democrat      762
2      M    Democrat      484
3      F    Independent   327
4      M    Independent   239
5      F    Republican    468
6      M    Republican    477

What I am simply trying to do is to stratified random permutation by party

library(dplyr)
md %>% 
  group_by(party) %>% 
  mutate(perm = sample(gender))

Which gives me a correct random permutation

     gender party        value   perm
1      F    Democrat      762      M
2      M    Democrat      484      F
3      F    Independent   327      M
4      M    Independent   239      F
5      F    Republican    468      F
6      M    Republican    477      M

What I would like is to repeat this operation many times. Following the solution proposed here (non-stratification permutation)

library(broom) 
md %>% 
 bootstrap(100) %>% 
 do(data.frame(., treat = sample(.$gender, 6, replace=TRUE)))

However, I am failing to introduce a group_by argument.

md %>% 
  bootstrap(10) %>% 
  group_by(party) %>% 
  do(data.frame(., treat = sample(.$gender, 6, replace=TRUE)))

Any idea ?

Also, bootstrap function is actually quite slow. Any idea why ? And any solution to make it faster ? Can we parallelise it somehow ?

library(reshape2)
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                party = c("Democrat","Independent", "Republican"))
md = melt(M) 
Community
  • 1
  • 1
giac
  • 4,261
  • 5
  • 30
  • 59
  • Does `md %>% group_by(party) %>% bootstrap(10, by_group = TRUE) %>% do(data.frame(., treat = sample(.$gender, 6, replace=TRUE)))` work? – Axeman Sep 30 '16 at 11:34
  • You wrote, *What I would like is to repeat this operation many times* But what is the expected output ? What about using `replicate`? – agstudy Sep 30 '16 at 11:36
  • @agstudy replicate is a possibility, I used it in the question mentioned. It just makes the codes cumbersome. – giac Sep 30 '16 at 11:37
  • @Axeman no actually. It doesn't stratified properly. Any clue of parallelisation? – giac Sep 30 '16 at 11:38
  • 1
    Parallelisation with `dplyr` can be done with the `multidplyr` package on github. But it's still kind of buggy, and it won't support `bootstrap`. – Axeman Sep 30 '16 at 11:43
  • Thank you interesting. Hope it will be implemented in the future. – giac Sep 30 '16 at 11:46

1 Answers1

3

Here a solution using data.table ( if you are looking for performance you should really give it a go) package and replicate:

setDT(dx)
rbindlist(replicate(10,dx[,perm := sample(gender),party],simplify=FALSE))

I am not a user of dplyr neither a piper , but if your a "pipe-fanatic" you can transform the code above and pipe it:

PERM <- function(dx)
  dx[,perm := sample(gender),party]

REPLICATE <- function(dx,n)
  rbindlist(replicate(n,dx[,perm := sample(gender),party],simplify=FALSE))

dx %>%
  PERM() %>%
  REPLICATE(10)
agstudy
  • 119,832
  • 17
  • 199
  • 261