0

Assume I have an empty tibble my_tbl (0 rows) but whose column types are given. For example:

library(tibble)
library(lubridate)

my_tbl <- tibble(
  x = integer(),
  y = character(),
  w = ymd(),
  z = list()
  )

How to randomly populate my_tbl with n rows (let's say n=10 for the sake of demonstration)?

If possible I am looking for a simple tidyverse piece of code (but base R would be just fine too).

I understand that my requirements do not fully specify how to fill those rows but something that is not simply recycling a value for each column would already suffice. I'd like to have a simple way of randomly generating tibbles given known column types. The ultimate goal is to run tests on these generated tibbles.

Ramiro Magno
  • 3,085
  • 15
  • 30
  • 2
    What would a randomly generated `list()` be? You must have some pool of "reasonable" values from which you want to draw for each of those columns. It's not clear exactly what your definition of "random" is here for each of these columns. – MrFlick Feb 06 '19 at 21:43
  • 2
    @MrFlick: A list of a single object, of whatever base type would be okay. – Ramiro Magno Feb 06 '19 at 21:47

1 Answers1

2

You could write a function that calls sample to generate each column randomly:

library(tibble)
library(purrr)

get_random_tbl <- function(tbl, n){
  classes <- map_chr(tbl, class)
  map_dfc(
    classes,
    ~{
      switch(
        .x,
        integer = sample(1:100, n, replace = TRUE),
        character = sample(LETTERS, n, replace = TRUE),
        Date = sample(seq(as.Date('1999/01/01'), as.Date('2019/01/01'), by = "day"), n, replace = TRUE),
        list = sample(c(list("x"), list(1)), n, replace = TRUE),
        stop()
      )
    }
  )
}


get_random_tbl(my_tbl, 3)
# A tibble: 3 x 4
#      x y     w          z        
#  <int> <chr> <date>     <list>   
#1    18 V     2015-11-30 <dbl [1]>
#2    34 D     2004-05-26 <chr [1]>
#3    76 B     2007-03-16 <chr [1]>
dave-edison
  • 3,666
  • 7
  • 19
  • 1
    Thanks for your example but I'd like some code that works for an arbitrary tibble with potentially more columns and different types. Your example is hardcoded to my example tibble. – Ramiro Magno Feb 06 '19 at 21:53
  • 1
    See my edit, you can add your tibble as an argument and then use a `switch` statement to generate the correct data based on type. This should work on an arbitrary tibble given you define a random data generation function for each possible type in the `switch` statement. – dave-edison Feb 06 '19 at 22:13
  • Can you explain how the formula expression `~{...}` is working there? – Ramiro Magno Feb 07 '19 at 08:01
  • 1
    That is a purrr shortcut for an anonymous function, it is equivalent to `function(...) {...}` – dave-edison Feb 07 '19 at 17:51