0

How can I find this probability P(X<Y) in R? knowing that X and Y are independent random variables, where X~beta(1,1), Y~beta(2,3)?

kulala
  • 39
  • 1
  • 6
  • Note that when the parameters a and b are integers, the beta pdf is proportional to a polynomial, so when you set your integral over X and Y, the integrand is a product of polynomials, and the main trick is to get the bounds of integration right. Note further than beta(1, 1) is a constant so the integrand is even simpler. – Robert Dodier Jun 21 '21 at 06:05
  • 2
    By the way, this is off topic for this forum; try stats.stackexchange.com instead, and also show your own work on the problem. – Robert Dodier Jun 21 '21 at 06:13
  • I’m voting to close this question because it's not a programming question – Ben Bolker May 09 '23 at 18:52

5 Answers5

3

The cubature package is a tried and tested set of tools for multivariate integration. Build a function of two variables as arguments to the proper density functions and include the condition for the inequality as a logical value that will be numerical 1 only in the region where X < Y and integrate of over the joint range of (X,Y) = {[0,1],[[0,1]}:

Sadly it is necessary to use a single name for the vector of values sent to the function, so it's not quite as transparent as it might otherwise be:

library(cubature)
prod2beta <- function(x){ (x[1] < x[2]) *    # (X < Y) logical times...
                           dbeta(x[1],1,1) * # X density ...
                           dbeta(x[2], 2,3)} # times Y density

hcubature( prod2beta, lower=c(0,0), upper=c(1,1)) # integrate over unit square
#-------------------
 $integral
[1] 0.4

$error
[1] 3.999939e-06

$functionEvaluations
[1] 2168775

$returnCode
[1] 0

Here's a wireframe plot to aid in understanding the geometric situation:

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Implementing the analytical solution seems quite challenging and computationally intensive.

If you're happy with an approximate solution, try either of the following:

Method 1: Simulation

n <- 1000000
x <- rbeta(n, 1, 1)
y <- rbeta(n, 2, 3)

sum(x < y)/n
[1] 0.399723

Results here are dependent on the n you choose and RNG, but higher n will yield pretty accurate estimates.

Method 2: Normal Approximation

See Cook (2012) for method.

x_a <- x_b <- 1
y_a <- 2
y_b <- 3

mu_x <- 1 / (1 + 1)
mu_y <- 2 / (2 + 3)

var_x <- (x_a*x_b) / ( (x_a + x_b)^2 * (x_a + x_b + 1) )
var_y <- (y_a*y_b) / ( (y_a + y_b)^2 * (y_a + y_b + 1) )

pnorm((mu_y - mu_x) / ((var_y + var_x)^0.5))
[1] 0.3879188

This is less computationally intensive, and still fairly accurate. According to Cook, the average absolute error for this kind of approximation is 0.006676, and in your case, no higher than 0.05069.

Rory S
  • 1,278
  • 5
  • 17
  • The answers above are for X > Y whereas the question was for the converse. – IRTFM Jun 20 '21 at 22:05
  • I dunno about the difficulty of the problem. The general solution to which you linked is a mess for general a and b values, but for integers it's much simpler. OP was given beta(1, 1) for one of the terms which is a constant, so it's even simpler. Paper and pencil solutions give a lot more insight so in order to gain that insight, it should be the first thing to try. – Robert Dodier Jun 21 '21 at 06:08
  • @IRTFM Thanks, I completely missed that when I originally answered! Fixed it with an edit. – Rory S Jun 21 '21 at 07:12
0

We can use mean for Monte Carlo simulation

> n <- 1e6

> mean(rbeta(n, 1, 1) < rbeta(n, 2, 3))
[1] 0.400643
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

As Robert Dodier pointed out the problem is tractable analytically if all parameters are integers, and easy if one distribution is Beta(1,1) for any parameters for the other. In general if we have independent variables X (density p, CDF P) and Y (q and Q) on [0,1] then

P(X<Y) = Integral{ 0<=x<y | p(x)q(y)}
       = Integral{ 0<=y<=1 | Integral{ 0<=x<=y | p(x)} q(y)} 
       = Integral{ 0<=y<=1 | P(y)q(y)}

In the case at hand p(x) = 1 and so P(x) = x

P(X<Y) = Integral{ 0<=y<=1 | y*q(y)}

substituting for q,

P(X<Y) = Integral( 0<=y<=1 | y*pow(y,a-1)*pow(1-y,b-1)} / B(a,b)
       = Integral( 0<=y<=1 | pow(y,a)*pow(1-y,b-1)}/B(a,b)

But the integrand is the density of a B(a+1,b) variable, scaled by B(a+1,b), so

P(X<Y) = B(a+1,b)/B(a,b)

And using the definition of B and the recurrence for the gamma function,

P(X<Y) = a/(a+b)

And in the particular case, a=2, b=3, P(X<Y) = 0.4

dmuir
  • 4,211
  • 2
  • 14
  • 12
0

Beta(1,1) is the uniform distribution on (0,1). Let's denote it by U.

P(Y > U) = integral P(Y > u) du = integral (1 - F(u)) du where F is the cdf of Y. It is known this equals E(Y) (expectation of Y) which is a/(a+b).

Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225