-2

what is the equivalent of =BETADIST(0.5,8,3) of excel in python ?

I have tried using scipy.stats and unable to achieve same results..

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • 3
    " have tried using scipy.stats and unable to achieve same results" what _specifically_ have you tried, and what went wrong with your attempts? A question should include a [mcve] so that we can offer specific answers and help – G. Anderson Feb 24 '23 at 23:03

2 Answers2

2

As G. Anderson points out this depends on your specific application. For a start you can define a beta distribution in scipy, draw some samples and and plot them as follows:

import scipy
import numpy as np
import matplotlib.pyplot as plt

# the true distribution parameters
a_true = 2
b_true = 3

# define a beta distribution and draw some samples
beta_true = scipy.stats.beta(a=a_true, b=b_true)
y_obs = beta_true.rvs(size=500)

x = np.linspace(0, 1, 100)

# plot the samples and PDF
plt.plot(x, beta_true.pdf(x), lw=2, color="r", label='PDF beta_true')
plt.hist(y_obs, bins=20, density=True, label="samples")
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()

If what you want is to infer parameters for a stochastic process that you assume follows a beta distribution you could use pymc:

import pymc as pm
import arviz as az

model = pm.Model()

with model:
    # priors for a and b
    a = pm.HalfNormal("a", sigma=1)
    b = pm.HalfNormal("b", sigma=1)
    
    # observed RV
    y = pm.Beta("y", alpha=a, beta=b, observed=y_obs)
    
    tr = pm.sample(1000, tune=2000, chains=4, cores=8, return_inferencedata=True)
>>> az.summary(tr, round_to=4)

    mean    sd  hdi_3%  hdi_97%     mcse_mean   mcse_sd     ess_bulk    ess_tail    r_hat
a   1.9152  0.1089  1.7109  2.1217  0.0035  0.0025  973.6684    1550.1573   1.0024
b   2.9142  0.1751  2.5962  3.2444  0.0057  0.0040  949.3716    1590.1557   1.0036
# estimated beta distribution
a_hat, b_hat = az.summary(tr, round_to=4)["mean"]
beta_estimated = scipy.stats.beta(a=a_hat, b=b_hat)

# plot the PDFs of the true and the estimated distributions of RV "y"
plt.hist(y_obs, bins=20, density=True, label="samples")
plt.plot(x, beta_estimated.pdf(x), lw=2, label='PDF beta_estimated')
plt.plot(x, beta_true.pdf(x), lw=2, color="r", label='PDF beta_true')
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()

Please specify your question so we can give specific advice in return.

plasma_r
  • 116
  • 5
1

There are two functions in SciPy that provide the same calculation as Excel's BETADIST(x, alpha, beta): the function scipy.special.btdtr(alpha, beta, x) and the cdf method of the distribution scipy.stats.beta. (Note that btdtr has parameters in a different order.)

In [13]: from scipy.special import btdtr

In [14]: from scipy.stats import beta

Either of these computes the same result (other than a tiny numerical difference that is the result of floating point imprecision):

In [15]: btdtr(8, 3, 0.5)
Out[15]: 0.054687500000000014

In [16]: beta.cdf(0.5, 8, 3)
Out[16]: 0.0546875

If all you need is the CDF for the standard beta distribution, then use scipy.special.btdtr, because beta.cdf(0.5, 8, 3) is 50 times slower than btdtr(8, 3, 0.5):

In [18]: %timeit btdtr(8, 3, 0.5)
13.5 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [19]: %timeit beta.cdf(0.5, 8, 3)
673 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

If you will eventually need the PDF, or a function to fit the beta distribution to data, or you need to set the bounds of the support to arbitrary points, then look that scipy.stats.beta.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214