SQL Recomputation in queries?

Question

SELECT COUNT(*) FROM table1 WHERE COUNT(*) > 5;

In this query, we have called COUNT(*) twice, causing two computations behind in the scenes. Is this the optimal way to do the query or is there some better way? Does the query engine just cache COUNT(*) from table1?

Another example:

SELECT MAX(col1) FROM table1 WHERE MAX(col1) > 5;

The query engine can decide what it wants to do. The expensive part of the query is reading all the rows, not counting them, so I wouldn't worry if it calculates it twice. — Gordon Linoff, Jun 16 '19 at 01:43
@GordonLinoff, is this the only way to write the query though? I was only giving an example, you could have `MAX(someColumn)`, computing this several times is expensive I would imagine — asknsdkns, Jun 16 '19 at 01:47
(1) `max()` is generally not very expensive either (relative to reading t he rows). (2) You can use a subquery. — Gordon Linoff, Jun 16 '19 at 01:48
This is a faq. Before considering posting please always google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names; read many answers. If you post a question, use one phrasing as title. [ask] PS MySQL manual re SELECT: "The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits HAVING to refer to columns in the SELECT list and columns in outer subqueries as well." — philipxy, Jun 16 '19 at 03:09
Read about query execution. You are describing a result & the code you write describes that & not an algorithm. If you want to write code that is executed as quickly as you can get it to you need to learn how to query & then principles of query implementation/optimization both generally & in a particular version of a particular DBMS. PS This is also a faq. Unfortunately but as usual these faqs mostly show no research. See [help] links & the voting arrow mouseover texts. PS The SQL standard doesn't even guarantee a "deterministic" (SQL term) call on the same values happens more than once. — philipxy, Jun 16 '19 at 03:27

score 0 · Answer 1 · answered Jun 16 '19 at 01:54

0

This query is not syntactically correct:

SELECT COUNT(*)
FROM table1
WHERE COUNT(*) > 5;

Aggregation functions are not allowed in the WHERE clause. You probably intend:

SELECT COUNT(*)
FROM table1
HAVING COUNT(*) > 5;

In many databases (including MySQL, you can express this as:

SELECT COUNT(*) as cnt
FROM table1
HAVING cnt > 5;

I simply do not know if the second reference to cnt is guaranteed not to recalculate the value. In fact, it might depend on the database.

That said, if you want to guarantee that no recalculation occurs, use a subquery:

SELECT cnt
FROM (SELECT COUNT(*) as cnt
      FROM table1
     ) x
WHERE cnt > 5;

answered Jun 16 '19 at 01:54

Gordon Linoff

1,242,037
58
646
786

Hmm, I don't *know* it too, but does a subquery really *guarantee* anything? Given that the outer query uses a `WHERE` that, if pushed down, needs to be converted into a `HAVING` might be a certain hurdle. But to me, it doesn't seem impossible for an optimizer to do that. Maybe the only thing that *guaranteed* it, would be to materialize the inner query, may it be via hints or similar or with a `CREATE TABLE ... AS` if necessary. But I might be very wrong on this... – sticky bit Jun 16 '19 at 02:07
@stickybit Nothing one writes guarantees anything except the value of the result. – philipxy Jun 16 '19 at 03:10

SQL Recomputation in queries?

1 Answers1