select unique values with equal probability

Question

I have a data frame like the following

I want to get unique c1 values, where c2 can be chosen with equal probability if there are multiple rows with the same c1 value. For example, the final result can be:

c1 c2
1 2
2 2
3 2
...

"A random choice of c2 for each possible value of c1" is what I want.

A clearer explanation is needed in order for this question to be comprehensible. — whuber, May 29 '13 at 19:04
None of it! Please see my comment to Stefan Wager's reply for some different examples of how this could be interpreted. — whuber, May 29 '13 at 21:15
I agree with @whuber - as it stands what you're asking for can be interpreted in a number of ways. Can you try to clarify the situation you want in a really simple completely worked example or two? (for example, one where c1 only takes two different values, and there are only say three rows, where you explicitly describe the distribution you want to sample from) — Glen_b, May 29 '13 at 22:29

score 1 · Accepted Answer · answered May 29 '13 at 19:24

1

Here's a simple way to do it. Let's say your dataframe is called df.

x = unique(df$c1);
y = sapply(x, function(arg)sample(df$c2[df$c1 == arg], 1));
new_df = data.frame(c1 = x, c2 = y);

answered May 29 '13 at 19:24

Stefan Wager

126
2

1

To do what? What version of this question are you answering? The set of unique values of `c1` that appears in a simple random sample of `c2`? A random choice of `c2` for each possible value of `c1`? Something else? – whuber May 29 '13 at 19:43
1

"A random choice of c2 for each possible value of c1" is what I want. – Qiang Li May 29 '13 at 21:51

score 0 · Answer 2 · answered May 31 '13 at 09:20

0

Here's an easy way to sample a value of c2 for each unique value of c1:

aggregate(c2 ~ c1, dat, sample, 1) # dat is the name of you data frame

  c1 c2
1  1  2
2  2  4
3  3  1

answered May 31 '13 at 09:20

Sven Hohenstein

80,497
17
145
168

select unique values with equal probability

2 Answers2