1

I want to simulate random sample with nested factor. Factor Dept has two levels A & B. Level A has two nested levels A1 and A2. Level B has three nested levels B1, B2 and B3. Want to simulate random sample from 2022-01-01 to 2022-01-31 using some R code. Part of desired output is given below (from 2022-01-01 to 2022-01-02 only for reference).

library(tibble)

set.seed(12345)
df1 <-
  tibble(
    Date   = c(rep("2022-01-01", 5), rep("2022-01-02", 4), rep("2022-01-03", 4))
  , Dept   = c("A", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "B", "B")
  , Prog   = c("A1", "A2", "B1", "B2", "B3", "A1", "B1", "B2", "B3", "A1", "A2", "B2", "B3")
  , Amount = runif(n = 13, min = 50000, max = 100000) 
  )

df1
#> # A tibble: 13 x 4
#>    Date       Dept  Prog  Amount
#>    <chr>      <chr> <chr>  <dbl>
#>  1 2022-01-01 A     A1    86045.
#>  2 2022-01-01 A     A2    93789.
#>  3 2022-01-01 B     B1    88049.
#>  4 2022-01-01 B     B2    94306.
#>  5 2022-01-01 B     B3    72824.
#>  6 2022-01-02 A     A1    58319.
#>  7 2022-01-02 B     B1    66255.
#>  8 2022-01-02 B     B2    75461.
#>  9 2022-01-02 B     B3    86385.
#> 10 2022-01-03 A     A1    99487.
#> 11 2022-01-03 A     A2    51727.
#> 12 2022-01-03 B     B2    57619.
#> 13 2022-01-03 B     B3    86784.
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • Is the `df1` your expected output or input data – akrun Feb 18 '22 at 19:19
  • Similar to `df1` from `2022-01-01` to `2021-01-31`. – MYaseen208 Feb 18 '22 at 19:21
  • 1
    Do you need `crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), by = "1 day"), Dept = c("A", "B"), Prog = 1:2) %>% mutate(Prog = str_c(Dept, Prog), Amount = runif(n = n(), min = 50000, max = 100000) )` – akrun Feb 18 '22 at 19:24
  • @akrun: Your code produces the output very close to my desired output with two exceptions, **(1)** `B` has three nested levels and **(2)** some combinations may miss for some dates. – MYaseen208 Feb 18 '22 at 19:29
  • your second condition not clear. how many minimum/maximum combinations should be present – akrun Feb 18 '22 at 19:30
  • 1
    The min/max sampling 'n' is not clear. Otherwise `crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%mutate(Prog = str_c(Dept, Prog)) %>% filter(Prog != "A3") %>% group_by(Date, Dept) %>% slice_sample(n = 2) %>% mutate(Amount = runif(n = n(), min = 50000, max = 100000))` – akrun Feb 18 '22 at 19:33
  • In your example sample, you created the first date with 5 elements, second with 4 and third 4. Can you tell me the logic for those numbers – akrun Feb 18 '22 at 19:35
  • Just **Random**. There is **NO Pattern**. – MYaseen208 Feb 18 '22 at 19:37
  • But, still there would be some logic for min/max for each Date right? – akrun Feb 18 '22 at 19:39
  • Your previous comment serves the purpose. Please change it to answer. – MYaseen208 Feb 18 '22 at 19:41

1 Answers1

1

If we want to sample randomly, create the expanded data with crossing and then filter/slice to return random rows for each 'date'

library(dplyr)
library(tidyr)
library(stringr)
crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), 
   by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%
   mutate(Prog = str_c(Dept, Prog)) %>%
  filter(Prog != "A3") %>% 
  mutate(Amount = runif(n = n(), min = 50000, max = 100000)) %>% 
  group_by(Date) %>% 
  slice(seq_len(sample(row_number(), 1)))  %>%
  ungroup

-output

# A tibble: 102 × 4
   Date       Dept  Prog  Amount
   <date>     <chr> <chr>  <dbl>
 1 2022-01-01 A     A1    83964.
 2 2022-01-01 A     A2    93428.
 3 2022-01-01 B     B1    85187.
 4 2022-01-01 B     B2    79144.
 5 2022-01-01 B     B3    65784.
 6 2022-01-02 A     A1    86014.
 7 2022-01-03 A     A1    76060.
 8 2022-01-03 A     A2    56412.
 9 2022-01-03 B     B1    87365.
10 2022-01-03 B     B2    66169.
# … with 92 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662