5

I know there are many integer division questions on StackOverflow, but I did not see this one.

Similar to many programming languages, PostgreSQL performs integer division if both operands are integers.

If one has:

SELECT s.id AS student_id,
       COUNT(DISTINCT(bc.book_id)) / COUNT(c.id) AS average_books_per_class
FROM student s
     LEFT JOIN class c
        ON c.student_id = s.id
     LEFT JOIN book_class bc
        ON bc.class_id = c.id
GROUP BY s.id

Then to get what one intends, one must cast COUNT(DISTINCT(bc.book_id)) to a non-integer number type. If one does not do so, then Postgres, similar to many programming languages, does integer division which is unlikely to give what one wants as a result.

Postgres supports two syntaxes for doing this cast:

CAST( value AS type )

for example:

CAST( COUNT(DISTINCT(bc.book_id)) AS DOUBLE PRECISION )

It also supports the syntax:

value::type

for example:

COUNT(DISTINCT(bc.book_id))::decimal

Both syntaxes work, personally I prefer the one using CAST because it is more explicit (I think explicit is good). Others may prefer value::type because it is expressive yet terse -- shorter is often (up to a limit) better.

My question is about the number type to use.

In casting COUNT(DISTINCT(bc.book_id)) to a non-integer number type, Postgres gives the following types:

  • decimal
  • numeric
  • real
  • double precision

In my query I chose DOUBLE PRECISION.

I am wondering, specifically in the case of division, but also in any other context where one might need to cast an integer number type in PostgreSQL to a non-integer number type, which of the four choices is the best one and why?

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263

1 Answers1

7

decimal and numeric are synonyms, so there is one less choice.

This is the proper type if you either need very high precision (use numeric without any type modifiers) or if you want to round to a certain number of decimal positions (use numeric(20,2) for two decimal positions).

This data type is precise, but slow, because it is a “binary coded decimal” type.

real and double precision are 4-byte and 8-byte floating point numbers, fast but with rounding errors and limited precision.

Never use real, it has very low precision.

In most practical applications it shouldn't matter which of the two remaining types you use for that specific purpose.

Using the CAST syntax has the advantage that it complies with the standard.

Remark: DISTINCT(col) is syntactically correct, but misleading, because it is not a function. Write DISTINCT col instead.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263