random selection within groups

Question

Hello i hope this is not a duplicate question, but probably a very simple one. I couldn´t find the answer and i can´t solve it by my self.

i have a dataframe like below and i need to make a random selection of one row within each "id_lote". "id_pix" are unique but "id_lote" are repeated and the size of the groups (id_lote) are different. My result should be a subset dataframe with many rows as id_lote but randomly selected. I´m using sample command for other random selections but i can´t make it work for this issue. If i use unique command it won´t be a random subset... thanks in advance!

id_pix id_lote clase   f1   f2
45       4      Sg    2460 2401
46       4      Sg    2620 2422
47       4      Sg    2904 2627
48       5      M     2134 2044
49       5      M     2180 2104
50       5      M     2127 2069
83      11      S     2124 2062
84      11      S     2189 2336
85      11      S     2235 2162
86      11      S     2162 2153
87      11      S     2108 2124

score 3 · Accepted Answer · answered Jan 22 '15 at 20:07

With only base R you could use ave for example:

> DF[!!ave(seq_along(DF$id_lote), DF$id_lote, FUN=function(x) sample(x, 1) == x),]
#id_pix id_lote clase   f1   f2
#3     47       4    Sg 2904 2627
#6     50       5     M 2127 2069
#7     83      11     S 2124 2062

Or with dplyr, you could use sample_n:

library(dplyr)
> DF %>% group_by(id_lote) %>% sample_n(1)
#Source: local data frame [3 x 5]
#Groups: id_lote
#
#id_pix id_lote clase   f1   f2
#1     46       4    Sg 2620 2422
#2     48       5     M 2134 2044
#3     85      11     S 2235 2162

Thanks! i had to update my R version but the dplyr package is great! and it works perfect for my issue — Camilo, Jan 22 '15 at 22:44

score 2 · Answer 2 · answered Jan 22 '15 at 19:58

2

data.table works pretty well here

library(data.table)
setDT(data) #Convert data to a data.table

data[, .SD[sample(1:.N,1)], by=.(id_lote)]

answered Jan 22 '15 at 19:58

Mike.Gahan

4,565
23
39

score 1 · Answer 3 · answered Jan 22 '15 at 20:12

1

within(df[sample(1:nrow(df), size = nrow(df)), ], !duplicated(id_lote))

answered Jan 22 '15 at 20:12

lukeA

53,097
5
97
100

Thought the same about the `ave` .. ;-) – lukeA Jan 22 '15 at 20:15
i also try this one: prueba <- within(fr5[sample(1:nrow(fr5), size = nrow(fr5)), ], !duplicated(id_lote)) > dim(prueba) [1] 17451 26 but thedimensions of the result should be 2080 26 since i have 2080 id_lote in this DF. I don´t won´t to make it longuer because my problem is solved with the dplyr pakcage. I´m new here and i don´t know the question and answer systems... just wonna say thanks any way – Camilo Jan 22 '15 at 22:48

random selection within groups

3 Answers3

Linked