Generating Integer Sequences based on a Modified Bernoulli Distribution

Question

I want to use R to randomly generate an integer sequence that each integer is picked from a pool of integers (0,1,2,3....,k) with replacement. k is pre-determined. The selection probability for every integer k in (0,1,2,3....,k) is p^k(1-p) where p is pre-determined. That is, 1 has much higher probability to be picked compared to k and my final integer sequence will likely have more 1 than k. I am not sure how to implement this number selecting process in R.

Your notation is pretty messed up. What do you mean when you say "for every integer k in (0,1,2,3....,k)"? Please clarify, this makes no sense. — pjs, Mar 13 '21 at 19:23

score 2 · Accepted Answer · edited Mar 13 '21 at 22:47

A generic approach to this type of problem would be:

Calculate the p^k * (1-p) for each integer
Create a cumulative sum of these in a table t.
Draw a number from a uniform distribution with range(t)
Measure how far into t that number falls and check which integer that corresponds to.
The larger the probability for an integer is, the larger part of that range it will cover.

Here's quick and dirty example code:

draw <- function(n=1, k, p) {
    v <- seq( 0, k )
    pr <- (p ** v) * (1-p)
    t <- cumsum(pr)
    r <- range(t)
    x <- runif( n, min=min(r), max=max(r) )
    f <- findInterval( x, vec=t )
    v[ f+1 ] ## first interval is 0, and it will likely never pass highest interval
}

Note, the proposed solution doesn't care if your density function adds up to 1. In real life it likely will, based on your description. But that's not really important for the solution.

Peter O. · Answer 2 · 2022-04-06T18:43:30.273

The answer by Sirius is good. But as I can tell, what you're describing is something like a truncated geometric distribution.

I should note that the geometric distribution is defined differently in different works (see MathWorld, for example), so we use the distribution defined as follows:

P(X = x) ~ p^x * (1 - p), where x is an integer in [0, k].

I am not very familiar with R, but the solution involves calling rgeom(1, 1 - p) until the result is k or less.

Alternatively, you can use a generic rejection sampler, since the probabilities are known (better called weights here, since they need not sum to 1). Rejection sampling is described as follows:

Assume and each weight is 0 or greater. Store the weights in a list. Calculate the highest weight, call it max. Then, to choose an integer in the interval [0, k] using rejection sampling:

Choose a uniform random integer i in the interval [0, k].
With probability weights[i]/max (where weights[i] = p^i * (1-p) in your case), return i. Otherwise, go to step 1.

Given the weights for each item, there are many other ways to make a weighted choice besides rejection sampling or the solution in Sirius's answer; see my note on weighted choice algorithms.

Generating Integer Sequences based on a Modified Bernoulli Distribution

2 Answers2