Computing probabilities in R

Question

I have two questions, that I'd like to use R to solve.

I have a vector of values which distribution is unknown.

How do I calculate the probability of one of the values in the vector in R
How do I calculate the probability of one value happening by simulating 1000 times

my test data is as follows:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

Grateful for any assistance.

What did you try? asking 2 questions in the same post is kind of do it for me. — agstudy, Aug 30 '18 at 08:07
You say you want to simulate, but you don't mention how your data is derived. Perhaps you could use `sample()` to generate a random set of values e.g. `sample(1:10,size=20,replace=T)` — P1storius, Aug 30 '18 at 08:10
"I have a vector of values which distribution is unknown." So, your values are a sample from a distribution? Is anything known about the distribution? E.g., you example seems to indicate that it is a discrete distribution of positive integer values. This can't be solved without additional information or assumptions. — Roland, Aug 30 '18 at 08:32

000andy8484 · Answer 1 · 2018-08-30T08:46:26.310

To calculate the probability of a value from the unknown distribution you can basically compute the probabilities of the values:

prop.table(table(values_all))
values_all

which outputs:

1    2    3    4    5    6    7 
0.15 0.25 0.10 0.05 0.20 0.10 0.15

Or, you need to assume a distribution after inspecting your vector, e.g. a uniform(1,7) would be:

> punif(3, min = 1, max = 7)
[1] 0.3333333

On this decision process refer to this StackExchange answer. Also, note that with continuous distributions you should compute the difference between two double (numeric) values since the probability of a specific value would be zero by definition.

To avoid discretionary decisions, running simulations is often a safer choice. You can just sample with replacement:

b <- vector("numeric", 1000)
set.seed(1234)
for (i in 1:1000){
    b[i] <- sample(values_all, size=1, replace = T)
}
prop.table(table(b))

Which returns:

b
    1     2     3     4     5     6     7 
0.144 0.251 0.087 0.053 0.207 0.099 0.159

I.e.: a probability of value 3=8.7%.

score 2 · Answer 2 · answered Aug 30 '18 at 08:15

For question 1 you can use this:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

probability <- sum(values_all == prob_to_find) / length(values_all)

The probability is the number of times the value occurs (or values_all == prob_to_find) divided by the total number of values in your set.

For question 2 I commented on your question, because I need some extra info

Computing probabilities in R

2 Answers2

Linked