Questions tagged [probability]

Consider if your question would be better at stats.stackexchange.com. Probability touches upon uncertainty, random phenomena, random numbers, random variables, probability distributions, sampling, combinatorics.

See also https://statistics.stackexchange.com

Probability theory is a branch of mathematics that studies uncertainty and random phenomena. It operates by introducing a sample space (a set), and associating probabilities (numbers between 0 and 1, inclusive) to certain subsets of this set, in a manner that satisfies some sensible axioms. If the sample space can be thought of as the real line, we obtain random variables; if it is a Euclidean space, we obtain random vectors. Random variables and random vectors have associated probability distributions, which can be characterized by probability density functions, cumulative density functions, moments, characteristic or moment generating functions.

Typically, questions with this tag will deal with computing (exactly or approximately) probabilities of certain events (from winning a lottery to server outages), drawing random samples, approximating distributions, etc. There might be an overlap with statistics and/or statistical packages (R, SAS, Stata).

Synonym: probability-theory

4021 questions
17
votes
2 answers

How does the predict_proba() function in LightGBM work internally?

This is in reference to understanding, internally, how the probabilities for a class are predicted using LightGBM. Other packages, like sklearn, provide thorough detail for their classifiers. For example: LogisticRegression returns: Probability…
artemis
  • 6,857
  • 11
  • 46
  • 99
17
votes
2 answers

Effective Java Item 47: Know and use your libraries - Flawed random integer method example

In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n, I don't understand the two of the flaws he states. The method from the book is: private static final Random rnd = new…
17
votes
8 answers

Creating your own Tinyurl style uid

I'm writing a small article on humanly readable alternatives to Guids/UIDs, for example those used on TinyURL for the url hashes (which are often printed in magazines, so need to be short). The simple uid I'm generating is - 6 characters: either a…
Chris S
  • 64,770
  • 52
  • 221
  • 239
17
votes
5 answers

Generate random numbers distributed by Zipf

The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu…
dmeister
  • 34,704
  • 19
  • 73
  • 95
16
votes
4 answers

Seeking suggestions for data representation of a probability distribution

I'm looking for an elegant and efficient way to represent and store an arbitrary probability distribution constructed by explicit sampling. The distribution is expected to have the following properties: Samples are floating point values, but in…
George Skoptsov
  • 3,831
  • 1
  • 26
  • 44
16
votes
2 answers

Sigmoid output - can it be interpreted as probability?

Sigmoid function outputs a number between 0 and 1. Is this a probability or is it merely a 'yes or no' depending on whether it's above or below 0.5? Minimal example: Cats vs dogs binary classification. 0 is cat, 1 is dog. Can I perform the…
Voy
  • 5,286
  • 1
  • 49
  • 59
16
votes
3 answers

Probability and Neural Networks

Is it a good practice to use sigmoid or tanh output layers in Neural networks directly to estimate probabilities? i.e the probability of given input to occur is the output of sigmoid function in the NN EDIT I wanted to use neural network to learn…
Betamoo
  • 14,964
  • 25
  • 75
  • 109
16
votes
4 answers

Probability of 64bit Hash Code Collisions

The book Numerical Recipes offers a method to calculate 64bit hash codes in order to reduce the number of collisions. The algorithm is shown at http://www.javamex.com/tutorials/collections/strong_hash_code_implementation_2.shtml and is copied here…
isapir
  • 21,295
  • 13
  • 115
  • 116
15
votes
3 answers

Fast weighted random selection from very large set of values

I'm currently working on a problem that requires the random selection of an element from a set. Each of the elements has a weight(selection probability) associated with it. My problem is that for sets with a small number of elements say 5-10, the…
user760162
15
votes
6 answers

How to shorten UUID V4 without making it non-unique/guessable

I have to generate unique URL part which will be "unguessable" and "resistant" to brute force attack. It also has to be as short as possible :) and all generated values has to be of same length. I was thinking about using UUID V4 which can be…
user606521
  • 14,486
  • 30
  • 113
  • 204
15
votes
3 answers

Percentage Based Probability

I have this code snippet: Random rand = new Random(); int chance = rand.Next(1, 101); if (chance <= 25) // probability of 25% { Console.WriteLine("You win"); } else { Console.WriteLine("You lose"); } My question is, does it really…
BlueRay101
  • 1,447
  • 2
  • 18
  • 29
15
votes
5 answers

Estimating/forecasting download completion time

We've all poked fun at the 'X minutes remaining' dialog which seems to be too simplistic, but how can we improve it? Effectively, the input is the set of download speeds up to the current time, and we need to use this to estimate the completion…
Phil H
  • 19,928
  • 7
  • 68
  • 105
15
votes
15 answers

Is this a good or bad 'simulation' for Monty Hall? How come?

Through trying to explain the Monty Hall problem to a friend during class yesterday, we ended up coding it in Python to prove that if you always swap, you will win 2/3 times. We came up with this: import random as r #iterations = int(raw_input("How…
Josh Hunt
  • 14,225
  • 26
  • 79
  • 98
14
votes
5 answers

Choose random array element satisfying certain property

Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that satisfies p by random with uniform distribution. I do not know ahead of time how many items satisfy…
Paul Reiners
  • 8,576
  • 33
  • 117
  • 202
14
votes
3 answers

how to show that NDCG score is significant

Suppose the NDCG score for my retrieval system is .8. How do I interpret this score. How do i tell the reader that this score is significant?
Programmer
  • 6,565
  • 25
  • 78
  • 125