0

I have a data frame like the following

c1 c2
1 2
1 3
2 4
2 5
2 2
3 1
3 2
...

I want to get unique c1 values, where c2 can be chosen with equal probability if there are multiple rows with the same c1 value. For example, the final result can be:

c1 c2
1 2
2 2
3 2
...

"A random choice of c2 for each possible value of c1" is what I want.

Qiang Li
  • 10,593
  • 21
  • 77
  • 148
  • A clearer explanation is needed in order for this question to be comprehensible. – whuber May 29 '13 at 19:04
  • which part is not clear? – Qiang Li May 29 '13 at 20:36
  • 2
    None of it! Please see my comment to Stefan Wager's reply for some different examples of how this could be interpreted. – whuber May 29 '13 at 21:15
  • 2
    I agree with @whuber - as it stands what you're asking for can be interpreted in a number of ways. Can you try to clarify the situation you want in a really simple completely worked example or two? (for example, one where c1 only takes two different values, and there are only say three rows, where you explicitly describe the distribution you want to sample from) – Glen_b May 29 '13 at 22:29

2 Answers2

1

Here's a simple way to do it. Let's say your dataframe is called df.

x = unique(df$c1);
y = sapply(x, function(arg)sample(df$c2[df$c1 == arg], 1));
new_df = data.frame(c1 = x, c2 = y);
Stefan Wager
  • 126
  • 2
  • 1
    To do what? What version of this question are you answering? The set of unique values of `c1` that appears in a simple random sample of `c2`? A random choice of `c2` for each possible value of `c1`? Something else? – whuber May 29 '13 at 19:43
  • 1
    "A random choice of c2 for each possible value of c1" is what I want. – Qiang Li May 29 '13 at 21:51
0

Here's an easy way to sample a value of c2 for each unique value of c1:

aggregate(c2 ~ c1, dat, sample, 1) # dat is the name of you data frame

  c1 c2
1  1  2
2  2  4
3  3  1
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168