1

I have two distributions (sets of values) and wish to know the probability that one set "fits" into another. Something like this:

dist = [355, 221, 302, ...]
values = [550, 537, 404, ...]
odds = odds_all_values_in_dist(dist,values)

I didn't try anything (as I don't know probability theory good enough) and I've hardly used scipy at all. Also I don't know what distribution might be suitable, the data points are "number of clicks per day" (sort of).

Edit: dist are clicks/day last month, values are clicks/day current month. Hope this helps clarify what I'm trying to achieve despite my lack of knowledge in probability theory and math. :)

Edit 2: This month there has been a 50% increase in clicks. Given the number of clicks per day in the previous and current month, what are the odds that this increase is due to chance?

Jonas Byström
  • 25,316
  • 23
  • 100
  • 147
  • I find this quite unclear. What do you even *mean* by "the probability that one set 'fits' into another"? Set A either is or isn't a subset of set B, with no probability question at all (unless the sets are somehow randomly generated - but, if so, you haven't explained how) so by "fits in" you don't seem to mean "is a subset of" but then, what do you mean? – John Coleman Oct 27 '15 at 13:08
  • @JohnColeman: I tried to clarify in my edit, did it help? Please edit if you can explain it better! – Jonas Byström Oct 27 '15 at 13:30
  • 1
    It helps some but still leaves things vague. You need to have a probability model that describes the distribution of clicks per day in order for your question to make sense. Once you come up with a probability distribution on clicks per day then it makes sense to ask if one randomly generated sample of a given size is a subset of another randomly generated sample of another size. As it is, nothing that you have written gives any indication of how the clicks are distributed. Just guessing here -- but a Poisson distribution might work (use a goodness of fit test to confirm). – John Coleman Oct 27 '15 at 13:39
  • 1
    If you want to test whether the two samples are drawn from the same distribution, the [KS test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) might be appropriate. This is in Scipy under [``ks_2samp``](http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ks_2samp.html). But please make sure you understand the test before blindly using the code! – jakevdp Oct 27 '15 at 13:40
  • jakevdp's comment raises an important question -- are you interested in the probability that each clicks per day in one month was seen in the previous month or are you interested in the question of whether or not the clicks per day in one month has the same overall distribution as the clicks per day in the previous month? – John Coleman Oct 27 '15 at 13:58

0 Answers0