Psql - sub queries in from clause - bad practice?

Question

The task at hand is to select musicians (pid) and the amount of instruments each play, including instruments only played at a concert - these instruments might not be in the [plays] table.

I've solved it, but I read that sub queries in a from clause should be avoided if possible. Just out of curiosity, can anyone show me a more effective way? Or is this a good solution? I'm using psql.

select a.pid, sum(a.instr)
from 
(
    select pid, count(instr) as instr from plays group by pid
    union all
    select pid, count(instr) as instr from concert group by pid
) as a
group by a.pid;

score 1 · Answer 1 · answered Nov 03 '15 at 01:36

Such queries are not a issue. The query optimizer of the database will take care of getting the best out of this query. In some cases a INNER JOIN will be converted to exactly the same execution plan as a sub-SELECT.

If you think the query has a problem you can always fire up the EXPLAIN ANALYZE function of psql. This will give you a overview what your query is actually doing. This way you can also compare different ways to write the query.

The example you gave... I do not think you can solve this without sub-queries very easily. I think the way you chose is good. Anything involving some LEFT JOINs will be more difficult to read.

To be more specific, the query planner can and often does flatten subqueries, pull conditions up from subqueries to the outer query, push conditions down from the outer query into subqueries, etc. Subqueries are mostly just notational, and don't have as much to do with how the query really gets executed as you might expect. At least in PostgreSQL. — Craig Ringer, Nov 03 '15 at 23:57

score -1 · Answer 2 · edited Feb 03 '17 at 19:53

Advantages

Subqueries are advantageous because they structure the query to isolate each part of the statement, perform the same operation that would ordinarily require complex joins and unions and are easier to read.

Disadvantages

When using subqueries the query optimizer may need to perform additional steps so they take longer to execute than a join.
Uncorrelated subqueries are executed one time for each row of the parent query. If this kind of subqueries process a great amount of data, you should expect it to take a very long time to process the data.

Possible solution:

You can create temporary tables for storing the data of subqueries, then use a JOIN for completing the query. Remember that using a JOIN is better than using a subquery. How to Create a Table
Use a with clause. WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. It allows you to execute a subquery just once instead of executing it for each row. How to Use With Clause

NOTICE: You should avoid using UNION or UNION ALL.

Your disadvantages are incorrect. A correlated subquery is executed once for each row in the outer query, but an uncorrelated subquery isn't. A temp table is often slower, and unnecessary since you could use a CTE, which also has issues due to optimiser fencing. This really isn't accurate. The advantage part isn't correct either, for that matter, since the optimiser can pull up, push down and flatten subqueries. — Craig Ringer, Nov 03 '15 at 06:15

Psql - sub queries in from clause - bad practice?

2 Answers2