Questions tagged [birthday-paradox]

The Birthday Paradox is a phenomenon in probability in which the probability of a population containing two individuals with the same property is much greater than would be intuitively expected. In its original form it describes the likelihood that any two individuals in a room share a birthday. Amongst other things, the Birthday Paradox affects cryptography, hashing and various applications of random number generators.

The Birthday Paradox is a phenomenon where two individual samples selected from a population may have the same value, for example the birthdays of individuals in a room. The paradox is that the probability of this coincidence happening is much higher than people expect. Amongst other things the phenomenon affects hashing, cryptography and various applications of random number generators.

Overview

The problem is originally defined as the probability of any two people in the room sharing the same birthday. The key point is that any two people in the room could share a birthday. People tend to naively misinterpret the problem as the probability of someone in the room sharing a birthday with a specific individual, which is the source of the cognitive bias that often causes people to underestimate the probability. This is the incorrect assumption – there is no requirement for the match to be to a specific individual and any two individuals could match.

This graph shows the probability of a shared birthday as number of people in the room increases. For 23 people, the probability of two sharing a birthday is just over 50%.

The probability of a match occurring between any two individuals is much higher than the probability of a match to a specific individual as the match does not have to be to a specific date. Rather, you only have to find two individuals that share the same birthday. From this graph (which can be found on the wikipedia page on the subject), we can see that we only need 23 people in the room for there to be a 50% chance of finding two that match in this way.

Some applications

  • Cryptographic hashing relies on the low probability of any two items having the same hash value. In order to achieve this the hash value must be very large so as to make it computationally infeasible to find another item that hashes to the same value. The Birthday paradox means that the width of the hash values must be very large in order for the probability of a collision to be sufficiently low.

  • In a Hash Table, the Birthday Paradox makes the likelihood of collisions quite high unless you have a perfect hash function that hashes a controlled set of source values to a unique set of hash values. In other cases, the hash table must be able to deal with relatively frequent collisions.

  • If you need a guaranteed unique sequence of random numbers, you cannot rely on simply generating random numbers, as the probability of a collision is quite high for relatively small sample sets. Instead, a set of unique numbers must be generated (perhaps just sequentially) and shuffled into a random order.

The tag is most applicable to questions about applications directly affected by the phenomenon (e.g. cryptographic hashing), or perhaps re-tagging a question where the poster has displayed obvious naievity about it.

57 questions
2
votes
2 answers

Doing a Monte Carlo Analysis of the Birthday Paradox using a HashSet

DISCLAIMER : I DO NOT WANT THE ANSWER TO THIS PROBLEM. I SIMPLY NEED SOME GUIDANCE. I want to perform Monte Carlo analysis on the infamous Birthday Paradox (determining the probability that at least 2 people in a given group share the same birthday)…
2
votes
1 answer

Calculate original set size after hash collisions have occurred

You have an empty ice cube tray which has n little ice cube buckets, forming a natural hash space that's easy to visualize. Your friend has k pennies which he likes to put in ice cube trays. He uses a random number generator repeatedly to choose…
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
2
votes
2 answers

Generalised Birthday Calculation Given Hash Length

Let us assume that we are given the following: The length of the hash The chance of obtaining a collision Now, knowing the above, how can we obtain the number of "samples" needed to obtain the given chance percentage?
rameezk
  • 353
  • 2
  • 5
  • 15
1
vote
1 answer

birthday paradox function in R

I'm a beginner in R and am trying to create a birthday paradox function and managed to reach this point, and the result is approximately 0.5, as expected. k <- 23 sims <- 1000 event <- 0 for (i in 1:sims) { days <- sample(1:365, k, replace =…
YL101
  • 13
  • 2
1
vote
1 answer

How to tackle the Birthday Paradox Problem in Python?

I'm practicing the Birthday Paradox problem in Python. I've run it a bunch of times, with changing the random number of birthdays and **loop run number **, but the probability is either 0 or 100%, and I was unable to get other probability like 50%…
1
vote
0 answers

What's wrong with this code that tries to calculate the birthday paradox with ArrayLists?

The odds should be like 10% more to my understanding. However I can't ever get past 7 percent even when doing 7000 tests. I figure it must be the calculation for count of matches, but I cant figure out how it's wrong. import…
1
vote
1 answer

Is there a reverse way to find number of people with given 0.5 probability that two people will have same birthday but no using mathematical formula?

I'm doing birthday paradox, and want to know how many people can meet 0.5 probability that two people have same birthday by using python. I have tried no using mathematical formula to find probability with given the number of people by using random…
1
vote
1 answer

The Birthday Problem - at least 2 out of N

I received a bit of a modified birthday problem- I need to run a function that returns the probability that at least two out of N persons share the same birthday. Then a main function that calculates the minimal n such that this probability is at…
Yuki1112
  • 365
  • 2
  • 12
1
vote
2 answers

Birthday Paradox - Function with input variable

I am trying to simulate the probability that more than two students have the same birthday in a room full on n people. Currently I think my code is working properly, although I have to initially just run the first line of code to select my n value,…
Aesler
  • 181
  • 10
1
vote
1 answer

How can I find a collision for a toy hash function?

I'd like to find a collision for a simple hash function below (python): def hash_function(s=''): # 'Hello World!' -> 7b2ea1ba a, b, c, d = 0xa0, 0xb1, 0x11, 0x4d result_hash = '' for byte in bytes(s, 'ascii'): a ^= byte …
Denis Yakovenko
  • 3,241
  • 6
  • 48
  • 82
1
vote
4 answers

C++ Birthday Paradox Using a Boolean Function

I have an assignment where I need to calculate the probability that two people share the same birthday for a given room size (in my case 50) over many trials (5000). I have to assign the birthdays randomly to the number of people in the room. The…
AlecWhite
  • 11
  • 2
1
vote
2 answers

Java: How to create a room with people having random birthdays?

This is my second day on Java. I came across an interesting question on the Birthday Paradox. Generate a random birthday. Create a Person with a random birthday. Build a function to check if two persons have the same birthday. Create a Room with a…
1
vote
1 answer

What is the best way to generate *non-repeating* securely random numbers?

I'm working on something that needs to assign securely random, short (~40 bit) IDs. They need to be unique, which means doing it on a central server. Just using a new SecureRand each time would run into the birthday problem and start taking longer…
Sai
  • 6,919
  • 6
  • 42
  • 54
1
vote
2 answers

Calculating the probability of at least 2 duplicates in a world with 400 tiles and 50 objects? Java

First of all I want to let you know that I have been searching for some days now for an answer or something that could perhaps help me out a bit but I couldn't find anything so I am asking here. I have in my Java code: An arraylist of 50…
Burbanana
  • 49
  • 1
  • 3
  • 12
1
vote
1 answer

Having trouble with large numbers in Python

running into problems with: from pylab import * x=arange(0,365,1) y = [] for j in x: y.append(1-((math.factorial(365)/math.factorial(365-j))/(365**j))) plot(x,y) show() Any thoughts? I'm running python 2.7
Overtim3
  • 85
  • 1
  • 2
  • 13