2

Hello i hope this is not a duplicate question, but probably a very simple one. I couldn´t find the answer and i can´t solve it by my self.

i have a dataframe like below and i need to make a random selection of one row within each "id_lote". "id_pix" are unique but "id_lote" are repeated and the size of the groups (id_lote) are different. My result should be a subset dataframe with many rows as id_lote but randomly selected. I´m using sample command for other random selections but i can´t make it work for this issue. If i use unique command it won´t be a random subset... thanks in advance!

id_pix id_lote clase   f1   f2
45       4      Sg    2460 2401
46       4      Sg    2620 2422
47       4      Sg    2904 2627
48       5      M     2134 2044
49       5      M     2180 2104
50       5      M     2127 2069
83      11      S     2124 2062
84      11      S     2189 2336
85      11      S     2235 2162
86      11      S     2162 2153
87      11      S     2108 2124
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Camilo
  • 75
  • 6

3 Answers3

3

With only base R you could use ave for example:

> DF[!!ave(seq_along(DF$id_lote), DF$id_lote, FUN=function(x) sample(x, 1) == x),]
#id_pix id_lote clase   f1   f2
#3     47       4    Sg 2904 2627
#6     50       5     M 2127 2069
#7     83      11     S 2124 2062

Or with dplyr, you could use sample_n:

library(dplyr)
> DF %>% group_by(id_lote) %>% sample_n(1)
#Source: local data frame [3 x 5]
#Groups: id_lote
#
#id_pix id_lote clase   f1   f2
#1     46       4    Sg 2620 2422
#2     48       5     M 2134 2044
#3     85      11     S 2235 2162
talat
  • 68,970
  • 21
  • 126
  • 157
  • Thanks! i had to update my R version but the dplyr package is great! and it works perfect for my issue – Camilo Jan 22 '15 at 22:44
2

data.table works pretty well here

library(data.table)
setDT(data) #Convert data to a data.table

data[, .SD[sample(1:.N,1)], by=.(id_lote)]

Mike.Gahan
  • 4,565
  • 23
  • 39
1
within(df[sample(1:nrow(df), size = nrow(df)), ], !duplicated(id_lote))
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thought the same about the `ave` .. ;-) – lukeA Jan 22 '15 at 20:15
  • i also try this one: prueba <- within(fr5[sample(1:nrow(fr5), size = nrow(fr5)), ], !duplicated(id_lote)) > dim(prueba) [1] 17451 26 but thedimensions of the result should be 2080 26 since i have 2080 id_lote in this DF. I don´t won´t to make it longuer because my problem is solved with the dplyr pakcage. I´m new here and i don´t know the question and answer systems... just wonna say thanks any way – Camilo Jan 22 '15 at 22:48