8

This is the "sample_n" from dplyr in R.
https://dplyr.tidyverse.org/reference/sample.html

For reproducibility, I should place a seed so that someone else can get my exact results.

Is there a built-in way to set the seed for "sample_n"? Is this something that I do in the environment and "sample_n" responds to it?

These are not built-into the "sample_n" function.

  • There is the environment "set.seed" function [1]
  • There is a library 'withr' that creates a seed-containing wrapper for code [2]

.

user20650
  • 24,654
  • 5
  • 56
  • 91
EngrStudent
  • 1,924
  • 31
  • 46
  • 2
    There is nothing special about `sample_n`. You have to follow the same steps that you follow for other examples for reproducibility.. `set.seed(any_number);sample_n(mtcars, 1)` would always give the same result. Did you try that? – Ronak Shah Aug 16 '20 at 23:49

2 Answers2

8

The dplyr::sample_n documentation tells that :

This is a wrapper around sample.int() to make it easy to select random rows from a table. It currently only works for local tbls.

so behind sample_n, sample.int is called, which means that the standard Random Number Generator is used, and that you can use set.seed for reproducibility.

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • 6
    Maybe add to the answer that `set.seed(123)` needs to be called *each time* before `sample_n` is performed. – Paul Dec 14 '20 at 16:57
4

Does this example help? In it, I am using set.seed and the mtcars dataset.

set.seed(1)
x <- mtcars
sample_n(x, 10)

sample_n(x, 10) #without set.seed()

set.seed(1)
x <- mtcars
sample_n(x, 10)
Eric
  • 2,699
  • 5
  • 17