1

I'm trying to find the Python equivalent of R's rbeta() function in order to do some A/B testing. This is the code in R:

a <- 50
notA <- 200
b <- 200
notb <-400

trials <- 100000

alpha <- 1
beta <- 4

a.samples <- rbeta(trials, a+alpha, notA+beta)
b.samples <- rbeta(trials, b+alpha, notB+beta)

a_wins <- sum(a.samples > b.samples) / trials
b_wins <- sum(b.samples > a.samples) / trials

print(a_wins)
print(b_wins)

In this R example, B wins more than 99% of the time.

This is what I'm trying as the Python equivalent:

import scipy.stats as stats

a = 50
notA = 200
b = 200
notB =400

trials = 100000

alpha = 1
beta = 4

# Random data on which to calculate probability density
inputs = []
inputs = stats.beta(alpha, beta).rvs(size=trials) 

aSamples = stats.beta(a+alpha, notA+beta).pdf(inputs)
bSamples = stats.beta(b+alpha, notB+beta).pdf(inputs)

aWins = sum(aSamples > bSamples) / trials
bWins = sum(bSamples > aSamples) / trials

print("A", aWins)
print("B", bWins)

In the Python equivalent, A wins 75% of the time.

My guess is that the problem arises with inputs, the random stats on which the probability density is calculated. They're generated here with the .rvs() method of scipy.stats.beta. I've also tried the random number module and np.linspace() to no avail. Can someone tell me what I'm missing?


@user20650 solved it in the comments. Calling .pdf() is a redundant step. The function scipy.stats.beta().rvs() is the Python equivalent of R's rbeta() function. The correct code then should read:

aSamples = stats.beta(a+alpha, notA+beta).rvs(trials)
bSamples = stats.beta(b+alpha, notB+beta).rvs(trials)

Thank you everyone who replied. This was melting my head all day yesterday.

amunnelly
  • 1,144
  • 13
  • 22
  • 2
    Have you tried `np.random.beta`? – SamR Apr 17 '22 at 12:29
  • To amplify SamR's suggestion, try doing a search on: python beta distribution – SteveM Apr 17 '22 at 12:35
  • 1
    From a glance; the two approaches are not equivalent. In R, you sample then sum, in python you sample, calculate pdf of these values, then sum – user20650 Apr 17 '22 at 14:01
  • 2
    Thank you @user20650, you have solved it. Adding the `.pdf()` methods were redundant steps. I've updated my initial post with the answer, in the hope that's the correct way of doing things hereabouts now. – amunnelly Apr 17 '22 at 14:45

0 Answers0