2

I am writing analytics software that generates complicated queries. When building the where clause, it may happen that multiple constraints are added on the same database field. I was wondering if PostgreSQL rewrites multiple constraints into simpler ones. I did some tests:

SELECT COUNT(id) FROM employee WHERE age BETWEEN 18 AND 40;

I ran this 10 times and the average time spent was 65ms. Now I make the query a bit longer but trivial to optimize:

SELECT COUNT(id) FROM employee WHERE
(age BETWEEN 18 AND 40) AND
(age BETWEEN 18 AND 40) AND
(age BETWEEN 18 AND 40);

This query takes 100ms on average, which is a lot slower. Also, the following query:

SELECT COUNT(id) FROM employee WHERE
(age BETWEEN 28 AND 70) AND
(age BETWEEN 25 AND 40) AND
(age BETWEEN 22 AND 33) AND
(age BETWEEN 18 AND 30);

takes 105ms on average, while it is equivalent to:

SELECT COUNT(id) FROM employee WHERE age BETWEEN 28 AND 30;

which runs twice as fast.

These queries are semantically equivalent and I was expecting them to be optimized into the very same query before even the planner touches them. This seems like low-hanging fruit for the query rewriter. Is there any hidden configuration option I'm missing? I'm using postgresql 9.4.5.

Thank you!

nschagen
  • 31
  • 2

1 Answers1

1

The optimiser doesn't fold contiguous ranges together. It doesn't do that sort of datatype-level analysis.

PostgreSQL doesn't really care if you're testing ranges of integers, floating point values, or text strings. Range-folding like this would only be correct for types that are discrete countable ordinals. If you tried it for, say, floating point values you might get subtly wrong answers.

PostgreSQL doesn't know which types would satisfy the requrirements for this to be safe, so it can't do it. Also, every possible optimisation done by the query planner has a computational cost to check whether the optimisation might apply, so there's a trade-off between planning and execution costs.

TL;DR: This case won't be auto-optimised by the planner.

In future, though: always provide your PostgreSQL version and explain (buffers, analyze) output for the query/queries.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778