I loaded a 10 rows sample, having around 10 columns.
library(tidyverse)
# A tibble: 7 x 9
case_id scenario alert_number random_ref_id amount code_type is_cred_debt source type
<chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 2020500ZSJU45679007 Anomalies 8796964 xxxxdg6yht78lj 2137. 100 D xdd CASH
2 2020500ZSJU45679007 Anomalies 8796964 xxxxdg6yht78lj 2137. 100 D xdd CASH
3 2020500ZSJU45679007 Anomalies 8796964 xxxxdg6yht78lj 2137. 100 D xdd CASH
4 2020500ZSJU45679007 Anomalies 8796964 xxxxdg6yht78lj 2137. 100 D xdd CASH
5 2020500ZSJU45679111 Patterns 8678867 xxxykhkh67hhg 6000 200 C CFT WIRE
6 2020500ZSJU45679111 Patterns 8678867 xxxykhkh67hhg 7000 200 C CFT WIRE
7 2020500ZSJU45679111 Patterns 8678867 xxxykhkh67hhg 24000 200 C CFT WIRE
df <-
as.data.frame(
structure(
list(
case_id = c(
"2020500ZSJU45679007",
"2020500ZSJU45679007",
"2020500ZSJU45679007",
"2020500ZSJU45679007",
"2020500ZSJU45679111",
"2020500ZSJU45679111",
"2020500ZSJU45679111"
),
scenario = c(
"Anomalies",
"Anomalies",
"Anomalies",
"Anomalies",
"Patterns",
"Patterns",
"Patterns"
),
alert_number = c(8796964, 8796964, 8796964, 8796964, 8678867, 8678867, 8678867),
random_ref_id = c(
"xxxxdg6yht78lj",
"xxxxdg6yht78lj",
"xxxxdg6yht78lj",
"xxxxdg6yht78lj",
"xxxykhkh67hhg",
"xxxykhkh67hhg",
"xxxykhkh67hhg"
),
amount = c(2136.76, 2136.76, 2136.76, 2136.76, 6000, 7000, 24000),
code_type = c(100, 100, 100, 100, 200, 200, 200),
is_cred_debt = c("D", "D", "D", "D", "C", "C", "C"),
source = c("xdd", "xdd", "xdd", "xdd", "CFT", "CFT", "CFT"),
type = c("CASH", "CASH", "CASH", "CASH", "WIRE", "WIRE", "WIRE")
),
class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -7L),
spec = structure(list(cols = list(
I would like to know whether there are techniques that - starting from this 10 rows sample - can simulate a bigger sample of let's say 100 entries, where, each observation is randomly generated.
Considering that:
case_id
is a random string for each observationscenario
can either beAnomalies
orPatterns
alert_number
is a random string, the same for eachcase_id
random_ref_id
is a random string, the same for eachcase_id
amount
can be a varying number between 0 and 100000code_type
can either be 100 or 200, the same for eachcase_id
is_cred_debt
can either be D or C, the same for eachcase_id
source
can either be xdd or CFT, the same for eachcase_id
type
can either be CASH or WIRE, the same for eachcase_id
While I know how to do the other way around procedure, create a random sample from an initial df of let's say 100 observation to let's say 10, it's not clear to me how to generate a random simulation starting from this 10 observation sample.
Any hint would be very appreciated.