0

I have two questions, that I'd like to use R to solve.

I have a vector of values which distribution is unknown.

  1. How do I calculate the probability of one of the values in the vector in R
  2. How do I calculate the probability of one value happening by simulating 1000 times

my test data is as follows:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

Grateful for any assistance.

cephalopod
  • 1,826
  • 22
  • 31
  • What did you try? asking 2 questions in the same post is kind of do it for me. – agstudy Aug 30 '18 at 08:07
  • You say you want to simulate, but you don't mention how your data is derived. Perhaps you could use `sample()` to generate a random set of values e.g. `sample(1:10,size=20,replace=T)` – P1storius Aug 30 '18 at 08:10
  • 2
    "I have a vector of values which distribution is unknown." So, your values are a sample from a distribution? Is anything known about the distribution? E.g., you example seems to indicate that it is a discrete distribution of positive integer values. This can't be solved without additional information or assumptions. – Roland Aug 30 '18 at 08:32

2 Answers2

3

To calculate the probability of a value from the unknown distribution you can basically compute the probabilities of the values:

prop.table(table(values_all))
values_all

which outputs:

1    2    3    4    5    6    7 
0.15 0.25 0.10 0.05 0.20 0.10 0.15

Or, you need to assume a distribution after inspecting your vector, e.g. a uniform(1,7) would be:

> punif(3, min = 1, max = 7)
[1] 0.3333333

On this decision process refer to this StackExchange answer. Also, note that with continuous distributions you should compute the difference between two double (numeric) values since the probability of a specific value would be zero by definition.

To avoid discretionary decisions, running simulations is often a safer choice. You can just sample with replacement:

b <- vector("numeric", 1000)
set.seed(1234)
for (i in 1:1000){
    b[i] <- sample(values_all, size=1, replace = T)
}
prop.table(table(b))

Which returns:

b
    1     2     3     4     5     6     7 
0.144 0.251 0.087 0.053 0.207 0.099 0.159

I.e.: a probability of value 3=8.7%.

000andy8484
  • 563
  • 3
  • 16
2

For question 1 you can use this:

values_all <- c(rep(1, 3), rep(2, 5), rep(3, 2), 4, rep(5, 4), rep(6, 2), rep(7, 3))
prob_to_find <- 5

probability <- sum(values_all == prob_to_find) / length(values_all)

The probability is the number of times the value occurs (or values_all == prob_to_find) divided by the total number of values in your set.

For question 2 I commented on your question, because I need some extra info

P1storius
  • 917
  • 5
  • 12