What data type should be used for a probability?

Question

What data type should be used to store a probability (that goes therefore from 0 to 1)? Is there a more efficient way than using a double or float with value control (0 ≤ x ≤ 1)?

Just a note, `0 <= x <= 1` is a misleading C syntax, it does not do what is (wrongly) expected of it. — Sourav Ghosh, Jan 04 '18 at 15:20
You can use almost any type you like (be it an int or a float). It all depends how many outcomes you'd like to have (e.g. `int` vs `long long`) and what sort of code you're willing to write. — byxor, Jan 04 '18 at 15:22
What kind of granularity do you need? Would an int not work from, say, 0-10000, and you just present it as a percentage? — Nick, Jan 04 '18 at 15:24
The benefit of a floating point data type is that it can handle very large and very small numbers, but also handle small precision in between something like 0 and 1. I recommend a 64-bit floating point type. — Ctznkane525, Jan 04 '18 at 15:25
It really depends on what you want to do with it. A `double` gives you much more precision near `0`, for small probabilities. If that is what you need, fine. If you need more precision near `1`, you could use `y = 1-x`. If you need the same precision everywhere, an integer type would be more appropriate. — Jens Gustedt, Jan 04 '18 at 15:28
@Ctznkane525 I don't know if your reasoning holds: either you use a (whatever the number of bits) float, but only the part from 0 to 1, then you "waste" all the part from `MIN_FLOAT` to `0` and from `1` to `MAX_FLOAT`, either you scale it to [0;1] but in this case you have all the problems which come with floating point numbers: for instance, you probabilities can be much more precise if they are near 0 than if they are close to 1 (or precise around .5 and not precise around 0 and 1 if you scale also the negative part). Using fixed point number allows to have the same precision anywhere. — Bromind, Jan 04 '18 at 15:32
Bromind. While I agree that fixed point numbers allow the same precision anywhere, floating points don't waste digits, that's the benefit of them. — Ctznkane525, Jan 04 '18 at 15:37
Without more context this question cannot properly be answered. For example when dealing with probabilities relating to 6-sided dice games, an integer fixed point base 6 representation might be most efficient (and accurate!). — Peter G., Jan 04 '18 at 15:44
@Ctznkane525 floating points do waste bits, e.g. if the exponent is `0`, then whatever the significant is, you still have `1` (well, in practise, some of the exponent-`0` floats are used to represent special values such as NaN or +/- infinity), but in the end, you can represent *less* distinct numbers than with fixed points (for the same number of bits). The only advantage of floating points is that the range from `MIN_FLOAT` to `MAX_FLOAT` is larger than `MIN_FIX` to `MAX_FIX`, but at a cost of precision. For instance, when the exponent is maximum, you can not represent 2 successive integers. — Bromind, Jan 04 '18 at 15:45
Probability should probably use `double`. If concerned about space, use `float` and sacrifice some precision. — chux - Reinstate Monica, Jan 04 '18 at 16:01
@chux: I couldn't agree more. Have we all lost our marbles here? Does the OP really need more than 15 decimal significant figures of accuracy in the range [0, 1]? Are they really willing to write all the mathematical functions that they may require to suit this new type that they are inventing? — Bathsheba, Jan 04 '18 at 16:08
@Bathsheba As OP did not state precision nor distribution requirements (linear,logarithmic, etc.), the post isn't well answerable. I suppose OP could use `unsigned char` for a small, yet imprecise form. Interestingly, there is [binary16](https://en.wikipedia.org/wiki/Half-precision_floating-point_format). Yet considering all additional code needed, as you [commented](https://stackoverflow.com/questions/48098291/what-data-type-should-be-used-for-a-probability?noredirect=1#comment83173300_48098291), it is beyond reason OP will arrive at a robust non- `float,double` solution. — chux - Reinstate Monica, Jan 04 '18 at 16:18
Though it is implemented in node.js yet but probably [BigBit](https://bigbit.github.io/bigbitjs/) HB format can be the best fit as it'll not lose precision and will save space too. Disclaimer: I'm the author of it. — Amit Kumar Gupta, Dec 27 '18 at 09:02

score 3 · Answer 1 · answered Jan 04 '18 at 15:32

A common alternative choice is fixed-point arithmetic on unsigned short or unsigned int with the decimal point set to the far left: so, for the usual unsigned short, the value range is either from 0.00000 = 0/65535 to 1.0000 = 65535/65535, or 0.0000 = 0/65536 to 0.99998 = 65535/65536, depending on whether you would rather be able to represent 1.0 or 0.5 exactly.

The major advantages of this design are that the representable probabilities are spaced uniformly over the unit interval, and it's impossible for any calculation to produce a value outside the mathematically meaningful range. The major disadvantages are that P(AB) cannot be computed by simple multiplication, you have to choose which of 1.0 and 0.5 can be represented exactly, and underflow is much more likely to bite you. Performance is probably a wash on modern CPUs.

I don't know what you mean by "more efficient" so I can't be more specific.

score 0 · Answer 2 · answered Jan 04 '18 at 15:33

0

Yes, there is a more efficient way to use probability. Just use plain old normalized integer, where 1 is equivalent to the largest number in integer in your machine.

That means that you just need to scale the floating point number that is (0 ≤ x ≤ 1). You can find the discussion about the maximum number here: What is the maximum value for an int32?

Also, there are other methods, like Q number type and data format, but they are typically applied in DSP processor architecture of TI: https://en.wikipedia.org/wiki/Q_%28number_format%29

answered Jan 04 '18 at 15:33

VladP

529
3
15

1

I'm not convinced by this. It depends on what you're doing with the probability. If you're applying a lot of floating point calculations to it then unless you use a `double`, your poor runtime will spend an inordinate of time making unnecessary conversions. – Bathsheba Jan 04 '18 at 16:04
I understand your point, but why would you use floating point operations if the probability is already represented in much efficient fixed point arithmetic. Yes, it depends on the application, but that was not the question. In addition, the whole hundreds of current DSP processors by TI use fixed point and do very precise fixed point arithmetic. There are the cases where the floating point processor or coprocessor is needed, but they are so rare and never I have seen that the probability is done in floating point. However, it is easier to design software with floating point without thinking. – VladP Jan 04 '18 at 17:42

What data type should be used for a probability?

2 Answers2