2

I have conversion rates in my dataframe df_cr:

cr data grouped

These are conversions rates for every Monday over the past year for posts published on my webpage. If I take the average of conversions in my dataset (i.e. average over success/trials) I get a conversion of about 0.027

This is the range of my trials and success variables:

cr_trial_success

I'm interested in finding the conversion rate distribution via Bayesian methods and using pymc3. Based on the min/max range of my observations I built the following model:

with pm.Model() as comparing_days:
    alpha_1 = pm.Uniform('alpha_1', 1000, 10000, shape=1)
    beta_1 = pm.Uniform('beta_1', 40000, 250000, shape=1)
    p_B = pm.Beta('p_B', alpha=alpha_1, beta=beta_1, shape=1)
    obs = pm.Binomial('obs', n=df_cr.trials, 
                      p=p_B, observed=df_cr.success, shape=1)

After running 50K samples (after 1K burn-in) using pm.sample I get the output below

MCMC output

the alpha and beta parameter go up their max values, respectively and p_B ends up being very narrow. So I am doing something wrong (I'm new to pymc3). Is there something wrong with my priors? Does it make sense at all to pick the beta function parameters using uniform priors? Following @merv's comments I ran posterior checks:

with comparing_days:
    ppc = pm.sample_posterior_predictive(
        trace, var_names=["alpha_1", "beta_1", "p_B", "obs"], random_seed=12345
    )
    az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=comparing_days))

I guess that doesn't look too bad. Is it because I have enough data so that the prior is diluted?

posterior_check

hadron
  • 457
  • 4
  • 19
  • 2
    I sort of discussed this issue with priors in a Beta-Binomial [in this answer](https://stackoverflow.com/questions/54545661/pymc-with-observations-on-multiple-variables/54598998#54598998), though there the bounds were too high. In your case, your priors are very likely too restrictive, which forces the posterior predictive model to have way more uncertainty than the amount of data would naively imply. Were the records i.i.d. (the posts have no effect), then the sum of `alpha` and `beta` should be close to the sum of trials. If I get time, I will answer with a fuller explanation. – merv Jan 21 '21 at 04:20
  • 2
    Also, it might be worth checking out [posterior predictive checks](https://docs.pymc.io/notebooks/posterior_predictive.html). Learning how to use these would enable you to generically (i.e., for simple or complex models) answer your own questions about the model appropriateness (which are definitely the right questions to be asking!). – merv Jan 21 '21 at 04:24
  • Hi @merv thank you for your good comments! I added the posterior check, what do you think? – hadron Jan 28 '21 at 16:59
  • 1
    I agree that the PPC doesn't look terrible. However, I'm very surprised at the axis. The snippet of your data that you shared had single-digit successes, but this shows that the bulk of your data has between 2000-6000 successes. Is that really the case? Maybe you can show a pair plot of your raw data (trials vs success). – merv Jan 28 '21 at 19:57
  • hi @merv, I edited my post. I grouped the data by day, summing trials and success over all posts per day. The behavior of beta_1 doesn't change, it still runs up to the max value – hadron Jan 29 '21 at 09:28
  • Updating the link to the "posterior predictive checks " notebooks as things have moved around on PyMC website. https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/posterior_predictive.html – ultrasounder Feb 28 '23 at 18:57

0 Answers0