4
set.seed(59)
mean(sample(c(12,7,5),7,prob = c(.3,.3,.4),replace = T))
[1] 9.571429}

set.seed(59)
mean(sample(c(5,7,12),7,prob = c(.4,.3,.3),replace = T))
[1] 8.142857

Shouldn't both codes return the same sample mean, why is it different?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Olivia
  • 115
  • 6

2 Answers2

4

Well, first consider the simplier case where you leave off the prob=

set.seed(59) 
sample(c(12,7,5),7,replace = T)
# [1]  5 12 12  5  5 12  5
set.seed(59) 
sample(c(5,7,12),7,replace = T)
# [1] 12  5  5 12 12  5 12

Because you have different input, you get a different result. But also note that the sample function is really sampling from the vector indexes, not the actual values of the vector. See how in the second result, you've basically just swapped the 5s and the 12s. The only thing that matters is the length of the input vector. If you try it with

set.seed(59) 
sample(1:3,7,replace = T)
# [1] 3 1 1 3 3 1 3

See how you still get he same "accaaca" pattern (the middle value is never picked). That's what setting the seed will do for you. You really only get the exact same result of all other parameters are identical.

If you change the order of the values in the vector, and swap the probabilities, you won't get the same observations from a pseudorandom number generator like the one R uses. It's simply not "smart" enough to see those are the same statistical distribution. However, if you draw a bunch of sample over and over again, in the long run they will have similar means thanks to the law of large numbers.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

In addition to MrFlick I want to point out:

  1. Setting a seed means that eachtime you use a line of code for example your first line mean(sample(c(12,7,5),7,prob = c(.3,.3,.4),replace = T)) with a seed set.seed(59) on every machine on the earth an beyond the output should be > [1] 9.571429
  2. On the other hand, if you use the same seed set.seed(59) for a different set of data, as it is in your case mean(sample(c(5,7,12),7,prob = c(.4,.3,.3),replace = T)) will give you another output.
TarJae
  • 72,363
  • 6
  • 19
  • 66