2

The Postgres database I manage contains many columns containing numbers between 0 and 100, with a fixed (never more than 3, usually 2) number of decimal places. Most of these numbers cluster around their respective averages, and thus repeat a lot, which if implemented correctly, could make them ideal types for GIN and GIST indexing. This ties in really well with our future development plans, since GIN and/or GIST would be necessary to index certain types of queries we're planning to implement in the future. The functionality that comes with ranges would also be useful, which also would become available if they were stored as integers.

Currently, they're stored as floats, but this has caused problems in the past, when trying to look up numbers by their exact value. I'm aware of the NUMERIC class in Postgres, but that seems to do the opposite of what I want, as far as performance. We would almost never do any math with these numbers within Postgres, and for any math we did do, it would be extremely basic and precision wouldn't be important. As such, these columns seem like ideal candidates for fixed-point number types. Basically, I want to make them stored as integers internally, but when returned in queries and when set via UPDATE and INSERT, it should move the decimal point, say, three places to the left.

From my reading so far, I think I have two options. I could create triggers on those columns to manipulate the input and output of values of what would internally be integer types, or I could create custom scalar types in C. The second option seems to be the better one, but I'd think that if it were that easy to do, someone would have already done it by now. Or maybe there's a better solution that I just haven't come across yet? I'm leaning towards creating a custom data type, but I'm still not sure if there's a good reason not to.

Oh, also virtually all of the queries on these data types would be whether they fell within a given range. Possibly some addition and subtraction, but very little multiplication or division, and probably never anything more advanced than that. The code would only run one production server, possibly a test server, and 4-5 development machines. I'm not sure how much of that is relevant, but I'm hoping someone can set me on the right track.

virnovus
  • 129
  • 1
  • 12
  • Store them as integers * 1000 (e.g. 42.42 would be stored as 42420) and when you select the values, divide them again. –  Mar 02 '16 at 22:34
  • I thought of that, but all of our code that runs on top of the database is expecting to get the number in a certain format. It'd be a huge mess to fix all that code, and not a very elegant implementation either. Hence, the possibility of doing it with triggers. SELECT, INSERT, and UPDATE would all get triggers to divide or multiply by 1000 depending on which way the data was going. I have no idea what the performance impact would be though. – virnovus Mar 02 '16 at 23:29
  • Why not use views then? –  Mar 03 '16 at 06:46
  • Updatable views seems the way to go, if performance is acceptable. Writing a custom type in C will have higher setup and maintenance cost (eg. for developing). You should test both. – hruske Mar 03 '16 at 08:03
  • We *could* use views. I'd been considering using a materialized view for current data and then archiving historical data to slower media. But the database is only about 15GB, and our server package comes with a 160GB SSD, so storage isn't a priority. Creating a custom type would allow us to use those types in other tables too though, not just our main one. Using views would require new views on every table we wanted to do that with. – virnovus Mar 03 '16 at 18:49

1 Answers1

2

Assuming you don't want to go the "multiply by 100" route, defining your own simple data-type isn't too bad assuming you know some "C". Of course, you're paying a cost in portability of your data and continued maintenance, but that might be worth it to you. Check into implications regarding arrays, ranges and indexing though - there might be more code to write than you want.

http://pgxn.org/tag/type/

There are two fixed-point decimal implementations there, based on 32/64-bit storage. The first of them is written by Pavel Stehule who is a well-known PostgreSQL community member. I haven't used either or looked at the code, but it should be a good starting point for tests.

Richard Huxton
  • 21,516
  • 3
  • 39
  • 51
  • 3
    Worth noting that Postgres already has a fixed-point numeric type, [`money`](http://www.postgresql.org/docs/current/static/datatype-money.html), so if you did decide to build your own, you could probably steal most of [the code](https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c). – Nick Barnes Mar 03 '16 at 07:19
  • I wrote a lot of C during my masters program, so I should be ok there. As far as portability, what about submitting the code as an extension? If I went that route, what would be the preferred type name? fixed4p2? fixed4_2? fixed42? fixed4(2)? Not sure what the convention would be in this case. Would it be a lot of extra work to create a type that would accept a parameter for number of digits, like numeric? This would probably tie in with our longer-term plans to set up GIN and GIST indexes, so we'd have to get pretty far into the weeds with our C code as it is. Thanks for the help! – virnovus Mar 03 '16 at 16:38
  • @NickBarnes Hmm, looks like the money type is based on bigint, not numeric like I thought it was. Good call. – virnovus Mar 03 '16 at 19:32
  • 1
    @virnovus: The problem with type parameters (typmods) is that values don't generally come with a typmod attached. You will (potentially) receive one in an I/O or cast routine, but in any of your arithmetic / comparison functions, you're on your own. In other words, you would need to cram the typmod into the first few bits of the underlying integer, and take it into account in all of your calculations. So it does complicate things a bit, but if you wanted to release it as a general-purpose fixed point implementation, then I think this would be the way to go about it. – Nick Barnes Mar 04 '16 at 08:54