I want to understand why I am not getting a probability distribution when I use a simulation from a random normal distribution:
library(tidyverse)
df <- mtcars # data
df$sd <- sd(df$mpg) # standard deviation of the sample
set.seed(123)
f <- function(n1, s1, n2, s2){
mean(rnorm(10000, n1, s1) < rnorm(10000, n2, s2)) # function for probability distribution
}
g <- Vectorize(f, c("n1", "s1", "n2", "s2"))
set.seed(123)
res <- outer(df$mpg, df$sd, df$mpg, df$sd, FUN = g)
dimnames(res) <- list(row.names(df), row.names(df))
res <- data.frame(res)
res <- tibble::rownames_to_column(res, 'p1')
datalong_2 <- tidyr::gather(res, 'p2', 'value', 2:33) # output
I did this simulation but for some reason, I am not getting an actual probability distribution, my goal is to evaluate the probability of a car has less mpg than another car. But the sum of the probability does not add to one. I expect that this can be added to one or lower given that a tight might happens.
For example, the probability that Mazda Rx4
has a lower mpg than Mazda Rx4 wag
is 0.5094 while the probability that Mazda Rx4 wag
has a lower mpg than Mazda Rx4
is 0.5029, the sum of this probability is 1.0123. How can I change this code to get an actual probability distribution of one car has lower mpg than another car?