0

I have a data.frame like this:

> mydata
ID ID2 ID3 X1 X2
1  1    1
2  2    1
3  3    1
4  1    2
5  2    2
6  1    3
7  2    3
8  3    3
9  4    3
10 1    4
11 2    4

I want to take a random row(random ID2) for each ID3, so I can have a new database like this:

> mydata2
ID ID2 ID3 X1 X2
1   3    1
2   2    2
3   3    3
4   1    4

Can I do this in sqldf? Or other code?

Arun
  • 116,683
  • 26
  • 284
  • 387
ToToRo
  • 373
  • 1
  • 5
  • 12

2 Answers2

2

With sqldf:

library(sqldf)
sqldf("select max(random(*)) r, * from mydata group by ID3")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

"Or other code", using apply:

#dummy data
df <- read.table(text="ID ID2 ID3
1  1    1
2  2    1
3  3    1
4  1    2
5  2    2
6  1    3
7  2    3
8  3    3
9  4    3
10 1    4
11 2    4",header=TRUE)

#using lapply
do.call(rbind,
        lapply(split.data.frame(df,df$ID3),
               function(d) d[sample(1:nrow(d),1),]))
zx8754
  • 52,746
  • 12
  • 114
  • 209