Efficiently Store Decimal Numbers with Many Leading Zeros in Postgresql

Question

A number like:

0.000000000000000000000000000000000000000123456

is difficult to store without a large performance penalty with the available numeric types in postgres. This question addresses a similar problem, but I don't feel like it came to an acceptable resolution. Currently one of my colleagues landed on rounding numbers like this to 15 decimal places and just storing them as:

0.000000000000001

So that the double precision numeric type can be used which prevents the penalty associated with moving to a decimal numeric type. Numbers that are this small for my purposes are more or less functionally equivalent, because they are both very small (and mean more or less the same thing). However, we are graphing these results and when a large portion of the data set would be rounded like this it looks exceptionally stupid (flat line on the graph).

Because we are storing tens of thousands of these numbers and operating on them, the decimal numeric type is not a good option for us as the performance penalty is too large.

I am a scientist, and my natural inclination would just be to store these types of numbers in scientific notation, but it does't appear that postgres has this kind of functionality. I don't actually need all of the precision in the number, I just want to preserve 4 digits or so, so I don't even need the 15 digits that the float numeric type offers. What are the advantages and disadvantages of storing these numbers in two fields like this:

1.234 (real)
-40 (smallint)

where this is equivalent to 1.234*10^-40? This would allow for ~32000 leading decimals with only 2 bytes used to store them and 4 bytes to store the real value, for a total of maximally 6 bytes per number (gives me the exact number I want to store and takes less space than the existing solution which consumes 8 bytes). It also seems like sorting these numbers would be much improved as you'd need only sort on the smallint field first followed by the real field second.

Pretty sure I don't understand. The IEEE standard 8-byte format for double precision numbers enables the representation of very small numbers as a significand and exponent. If you can live with about 15 digits of precision, why not just use the double precision format? — scottb, Feb 07 '16 at 22:46
@scottb Because, as I mentioned in the question, rounding all of my results to 15 digit precision causes my graphs to flatline at that precision, so while the meaning of that data is functionally equivalent, the display looks horrible. — sage88, Feb 07 '16 at 22:48
If your graphs are "flatlining", it is more likely that there is a problem with the scaling of your graphs or unintended casting of your values to integral types. IEEE double precision numbers are as capable of representing very small numbers, to the limits of precision, as they are very large ones. In particular, there should be no advantage to separately storing significand and exponent separately in your database, as you've proposed, since that is how double precision numbers are represented internally. — scottb, Feb 07 '16 at 22:51
*"rounding all of my results to 15 digit precision causes my graphs to flatline"* ... just to clarify. What you described in your post was not rounding to 15 digits of precision. In fact, you lost a great deal of precision in the operation. More accurately, the result was truncated to 1 significant digit. — scottb, Feb 07 '16 at 22:59
Definitely agree with @scottb, what's the issue with using `double precision` aka `float`? Are you sure you're not rounding/truncating somewhere in the chain between Postgresql and the graphs? Of course if your numbers are actually more like `123.000000000000000000000456` than `0..000000000000000000000456` you'll have an issue, but that would be the case with other representations as well. — jcaron, Feb 07 '16 at 22:59
@scottb See the graph above where all numbers are smaller than 0.000000000000001. — sage88, Feb 07 '16 at 22:59
If you round everything to `0.000000000000001`, then obviously everything will be `0.000000000000001`. Why would you need to round the numbers? — jcaron, Feb 07 '16 at 23:01
@scottb I complete agree that this rounding approach is not a good one as accuracy is definitely lost (these numbers would be better represented as just 0.000000000000000. — sage88, Feb 07 '16 at 23:01
@jcaron because the double precision number type in postgres only allows 15 digits of decimal precision. — sage88, Feb 07 '16 at 23:02
@sage88: very large and very small numbers in IEEE floating point formats are not represented internally with leading or trailing zeros. The significand starts with the most significant digit and then includes up to 14 (or so) additional digits. The number would not be represented as 0.00000000000000001, but as 1.0E-18 (or whatever). — scottb, Feb 07 '16 at 23:02
@scottb alright I thought that was the case, my colleague explained this to me incorrectly then, I'm going to need to discuss this with him then. If what you're saying is correct then a real numeric type would work just as well? — sage88, Feb 07 '16 at 23:04
In a floating point number, 15 digits of precision does not mean 15 digits after the decimal point. It means you've got a total of 15 digits, but they may be after 200 zeroes. A `double precision` number will be able to store 0.12345678901234e-200 if needed! With the obvious limitations of floating point regarding accurate representation/rounding, of course. — jcaron, Feb 07 '16 at 23:04
@sage88: I think you should understand that in IEEE formats, numbers like 0.000000000000000000000000000000000000000123456 are stored without leading zeros as 1.23456E-40 (just as they would be displayed on a scientific calculator with the ability to display a maximum of 15 digits). — scottb, Feb 07 '16 at 23:06
jcaron and scottb thanks for the clarification. I'm going to have to get a better explanation for my colleague as to why he made this design decision. I really hate it, so this helped me understand even better why it was a terrible design. — sage88, Feb 07 '16 at 23:08
Note that that `numeric` aka `decimal` data types are very different from the `real`/`float`/`double precision` data types. Do not confuse them! — jcaron, Feb 07 '16 at 23:32

score 3 · Accepted Answer · answered Feb 07 '16 at 23:12

You and/or your colleague seem to be confused about what numbers can be represented using the floating point formats.

A double precision (aka float) number can store at least 15 significant digits, in the range from about 1e-307 to 1e+308. You have to think of it as scientific notation. Remove all the zeroes and move that to the exponent. If whatever you have once in scientific notation has less than 15 digits and an exponent between -307 and +308, it can be stored as is.

That means that 0.000000000000000000000000000000000000000123456 can definitely be stored as a double precision, and you'll keep all the significant digits (123456). No need to round that to 0.000000000000001 or anything like that.

Floating point numbers have well-known issue of exact representation of decimal numbers (as decimal numbers in base 10 do not necessarily map to decimal numbers in base 2), but that's probably not an issue for you (it's an issue if you need to be able to do exact comparisons on such numbers).

Thank you for the help. I will likely mark this as the correct answer. Since @scottb already had me headed down the correct path I'll give him a chance to answer first if he wants to and then pick the answer that gets at this the best. — sage88, Feb 07 '16 at 23:25
@sage88 so you don't need to save space nor have up to 32000 decimals, then ...? — Ilya, Feb 08 '16 at 08:12

Ilya · Answer 2 · 2016-02-08T08:10:16.460

What are the advantages and disadvantages of storing these numbers in two fields like this

You'll have to manage 2 columns instead of one.

Roughly, what you'll be doing is saving space by storing lower-precision floats. If you only need 4 digits of precision, you can go further and save 2 more bytes by using smallint + smallint (1000-9999 + exponent). Using that format, you could cram the two smallint into one 32 bits int (exponent*2^16 + mantissa), that should work too.

That's assuming that you need to save storage space and/or need to go beyond the +/-308 digits exponent limit of the double precision float. If that's not the case, the standard format is fine.

Efficiently Store Decimal Numbers with Many Leading Zeros in Postgresql

2 Answers2