Getting posterior distribution of difference between two variables using PYMC3

Question

Now assume we are looking at daily prices of two stocks, A and B. The prior is simple: the prices are all normal distributed, with mu_A and mu_B both uniformly distributed on [10,100] and sigma_A and sigma_B also uniformly distributed on [1,10]. (I know these are some naive/wrong assumptions - just to make the question clearer.)

Now assume I have observed these two stocks for a month and collected the price data. I can get posterior distribution of A and B separately, but idk how to get the posterior distribution of the difference between the two stocks?

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm.Model()
with basic_model: 
    mu_A = pm.Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm.Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm.Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm.Uniform('sigma_B', lower=1, upper=10)
    A = pm.Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm.Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm.Deterministic('dif', A-B)
map_estimate = pm.find_MAP(model=basic_model)
map_estimate

However the resulted estimate does not give a distribution of dif to me... Am I confusing the concept of posterior distribution?

Turn both arrays into percentage gains from the first date, giving you two arrays of equal length. Subtract one from the other to get the percentage difference, then take the distribution. — ajsp, Jan 25 '18 at 08:06
@ajsp, Thanks for your advice. I tried to update the question to clarify it a bit more. I think this would be a better way to rephrase the question: given the simple prior I had already and the prices_A, prices_B that I observed later, how to get the posterior distribution of the difference in the prices between the two stocks. — Cong Ba, Jan 26 '18 at 03:05

aloctavodia · Answer 1 · 2018-01-26T12:54:22.367

Subtract both variables, you can do it after sampling like:

C = trace['A'] - trace['B']

or you can do it as part of your model using a deterministic variable:

C = pm.Deterministic('C', A - B)

Update:

Now that you have posted your model I will suggest the following

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm.Model()
with basic_model: 
    mu_A = pm.Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm.Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm.Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm.Uniform('sigma_B', lower=1, upper=10)
    A = pm.Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm.Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm.Deterministic('dif', mu_A-mu_B) # diff of the means
    trace = pm.sample()

 pm.summary(trace)

Basically what I am suggesting is that you do not use find_MAP(), instead sample from the posterior and then from that samples (inside trace) compute what you want. For example summary will give you the mean, standard deviation and other quantities computed from the posterior samples.

You may also want to use sample_ppc to get "posterior predictive samples".

ppc = pm.sample_ppc(trace, 1000, basic_model)
dif_ppc = ppc['Y_1'] - ppc['Y_2']

dif_ppc represents the differences you expect to see for your stocks, including the uncertainty in the means and standard deviations of your stocks.

As a side note, maybe you want to replace your Uniform distributions by other distributions like Normal for the means and HalfNormals for the sigmas.

Thank you! I just updated the question with the code that I use. However I still just got a list of number for the 'dif ' but not an estimated mean and std. When we talk about posterior distribution, do we just look at the observed measurements/numbers or do we come up with the posterior mean and std? Sorry if I'm making the question confusing. Maybe I can ask it this way: given the prices_A, prices_B and the simple prior I had already, how to get the posterior distribution of the difference in the prices between the two stocks. — Cong Ba, Jan 26 '18 at 03:03
Notice you do not get "a distribution" but samples from the posterior distribution. I just update my answer, I hope it helps to make things clear. — aloctavodia, Jan 26 '18 at 12:55

Getting posterior distribution of difference between two variables using PYMC3

1 Answers1