I would like to simulate Revenues Scenarios upon: price
and est_p
(estimated probability) from the following df:
df <- data.frame(price = c(200, 100, 600, 20, 100),
est_p = c(0.9, 0.2, 0.8, 0.5, 0.6),
actual_sale = c(FALSE, TRUE, TRUE, TRUE, TRUE))
Revenue is - sum of price
s where actual_sale
is TRUE
:
print(actual1 <- sum(df$price[df$actual_sale])) # Actual Revenue
[1] 820
I've created a function to simulate Bernoulli trials upon est_p
and price
values:
bernoulli <- function(df) {
sapply(seq(nrow(df)), function(x) {
prc <- df$price[x]
p <- df$est_p[x]
sample(c(prc, 0), size = 1000, replace = T, prob = c(p, 1 - p))
})
}
And applied it to a sample df
:
set.seed(100)
distr1 <- rowSums(bernoulli(df))
quantile(distr1)
0% 25% 50% 75% 100%
0 700 820 920 1020
Looks OK, actual value = median! But when I apply the same function to increased (replicated x 1000 times) sample - df1000
, actual Revenue is out of bounds of simulated values:
df1000 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
print(actual2 <- sum(df1000$price[df1000$actual_sale]))
[1] 820000
distr2 <- rowSums(bernoulli(df1000))
quantile(distr2)
0% 25% 50% 75% 100%
726780 744300 750050 754920 775800
Why does the actual revenue is out of the range of simulated values? Where did I make a mistake and what is the correct solution to this problem?