Questions tagged [probability]

Consider if your question would be better at stats.stackexchange.com. Probability touches upon uncertainty, random phenomena, random numbers, random variables, probability distributions, sampling, combinatorics.

See also https://statistics.stackexchange.com

Probability theory is a branch of mathematics that studies uncertainty and random phenomena. It operates by introducing a sample space (a set), and associating probabilities (numbers between 0 and 1, inclusive) to certain subsets of this set, in a manner that satisfies some sensible axioms. If the sample space can be thought of as the real line, we obtain random variables; if it is a Euclidean space, we obtain random vectors. Random variables and random vectors have associated probability distributions, which can be characterized by probability density functions, cumulative density functions, moments, characteristic or moment generating functions.

Typically, questions with this tag will deal with computing (exactly or approximately) probabilities of certain events (from winning a lottery to server outages), drawing random samples, approximating distributions, etc. There might be an overlap with statistics and/or statistical packages (R, SAS, Stata).

Synonym: probability-theory

4021 questions
26
votes
7 answers

Select random row from a PostgreSQL table with weighted row probabilities

Example input: SELECT * FROM test; id | percent ----+---------- 1 | 50 2 | 35 3 | 15 (3 rows) How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time…
Oleg Golovanov
  • 905
  • 1
  • 14
  • 24
25
votes
1 answer

Probability of hash collision

I am looking for some precise math on the likelihood of collisions for MD5, SHA1, and SHA256 based on the birthday paradox. I am looking for something like a graph that says "If you have 10^8 keys, this is the probability. If you have 10^13 keys,…
Dark Nebula
  • 403
  • 1
  • 4
  • 6
25
votes
7 answers

C puzzle: Make a fair coin from a biased coin

How can I determine the probability that a function would return 0 or 1 in the following case: Let the function_A return 0 with probability 40% and 1 with probability 60%. Generate a function_B with probabilities 50-50 using only function_A …
garima
  • 5,154
  • 11
  • 46
  • 77
25
votes
4 answers

Is there a Python equivalent to R's sample() function?

I want to know if Python has an equivalent to the sample() function in R. The sample() function takes a sample of the specified size from the elements of x using either with or without replacement. The syntax is: sample(x, size, replace = FALSE,…
Bilal
  • 2,883
  • 5
  • 37
  • 60
25
votes
12 answers

Python - Is a dictionary slow to find frequency of each character?

I am trying to find a frequency of each symbol in any given text using an algorithm of O(n) complexity. My algorithm looks like: s = len(text) P = 1.0/s freqs = {} for char in text: try: freqs[char]+=P except: …
psihodelia
  • 29,566
  • 35
  • 108
  • 157
25
votes
7 answers

Generate random number with given probability matlab

I want to generate a random number with a given probability but I'm not sure how to: I need a number between 1 and 3 num = ceil(rand*3); but I need different values to have different probabilities of generating eg. 0.5 chance of 1 0.1 chance of…
Eamonn McEvoy
  • 8,876
  • 14
  • 53
  • 83
24
votes
7 answers

Computing similarity between two lists

EDIT: as everyone is getting confused, I want to simplify my question. I have two ordered lists. Now, I just want to compute how similar one list is to the other. Eg, 1,7,4,5,8,9 1,7,5,4,9,6 What is a good measure of similarity between these two…
24
votes
4 answers

Confidence interval for binomial data in R?

I know that I need mean and s.d to find the interval, however, what if the question is: For a survey of 1,000 randomly chosen workers, 520 of them are female. Create a 95% confidence interval for the proportion of workers who are female based on…
Pig
  • 2,002
  • 5
  • 26
  • 42
23
votes
2 answers

Fitting distributions, goodness of fit, p-value. Is it possible to do this with Scipy (Python)?

INTRODUCTION: I'm a bioinformatician. In my analysis which I perform on all human genes (about 20 000) I search for a particular short sequence motif to check how many times this motif occurs in each gene. Genes are 'written' in a linear sequence…
s_sherly
  • 2,307
  • 4
  • 19
  • 14
23
votes
1 answer

sampling multinomial from small log probability vectors in numpy/scipy

Is there a function in numpy/scipy that lets you sample multinomial from a vector of small log probabilities, without losing precision? example: # sample element randomly from these log probabilities l = [-900, -1680] the naive method fails because…
lgd
  • 1,472
  • 5
  • 17
  • 35
23
votes
4 answers

Probability Random Number Generator

Let's say I'm writing a simple luck game - each player presses Enter and the game assigns him a random number between 1-6. Just like a cube. At the end of the game, the player with the highest number wins. Now, let's say I'm a cheater. I want to…
Alon Gubkin
  • 56,458
  • 54
  • 195
  • 288
23
votes
5 answers

Algorithm to generate Poisson and binomial random numbers?

i've been looking around, but i'm not sure how to do it. i've found this page which, in the last paragraph, says: A simple generator for random numbers taken from a Poisson distribution is obtained using this simple recipe: if x1, x2, ... is a…
snap
22
votes
10 answers

Representing continuous probability distributions

I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I need is some way of taking two of these PDFs and doing arithmetic on…
Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
22
votes
2 answers

PyMC3 Bayesian Linear Regression prediction with sklearn.datasets

I've been trying to implement Bayesian Linear Regression models using PyMC3 with REAL DATA (i.e. not from linear function + gaussian noise) from the datasets in sklearn.datasets. I chose the regression dataset with the smallest number of attributes…
O.rka
  • 29,847
  • 68
  • 194
  • 309
22
votes
3 answers

How to properly hash the custom struct?

In the C++ language there is the default hash-function template std::hash for the most simple types, like std::string, int, etc. I suppose, that these functions have a good entropy and the corresponding random variable distribution is…
abyss.7
  • 13,882
  • 11
  • 56
  • 100