I'm taking a Machine Learning class and we are given our first statistics- "programming" exercise.
So the exercise goes like this:
Recall the story from the lecture “Two sellers at Amazon have the same price. One has 90 positive and 10 negative reviews. The other one 2 positive and 0 negative. Who should you buy from?” Write down the posterior probabilities about the reliability (as in the lecture). Calculate p(x > y|D1, D2) using numerical integration. You can gernate Beta distributed samples with the function scipy.stats.beta.rvs(a,b,size).
What we know from the lecture is the following:
applied two Beta-binomial models: p(x|D1) = Beta(x|91, 11) and p(y|D2) = Beta(y|3, 1)
Compute probability that seller 1 is more reliable than seller 2:
p(x > y | D1, D2 ) = ∫∫ [x > y] Beta (x| 91, 11) Beta (y| 3, 1) dx dy
So my attempts in Python are like that:
In [1]: import numpy as np
from scipy import integrate, stats
In [2]: f = lambda x, y: stats.beta.rvs(91, 11, x) * stats.beta.rvs(3, 1, y)
In [3]: stats.probplot(result, x > y)
And I receive an error that states:
... The maximum number of subdivisions (50) has been achieved....
but ultimately there is an answer to the calculation that is approx. 1.7 . (We are told that the answer is approx. 0.7 )
My question is: How do I calculate the [x > y] part, meaning: probability that seller 1 (x) is more reliable than seller 2 (y) ?