Does postgres union guarantee order of execution when invoking functions with side effects?

Question

I am using postgres 9.3 and trying to ensure that stored procedures are called in order when invoked from an sql statement. Does the following work (ensuring that foo() is called first and then bar())?

select null::void from (
    select 1 from foo()
    union select 1 from bar() ) _;

How about the following as well? [NB as it stands just a slightly more longer-winded version of the above -- what I'm after is whether in general CTEs will be executed in order of "invocation", in order of specification, or is it undefined?]

with x as ( select foo() ),
y as ( select bar() )
select null::void from ( select 1 from x union select 1 from y );

In addition to whether the proceeding work, is there a more idiomatic way to ensure execution order (with arbitrary/unimportant return values)? Perhaps just?

select null::void from ( select foo(), bar() ) _;

UPDATE -- to explain why:

What I'm hoping is that, if I specify my functions in sql, the optimizer can eliminate common subexpressions (etc), while keeping in order the parts that contain DDL. For instance, if foo, bar are

create function foo() returns void language sql as $$
    with x as (select is_immutable())
    select is_volatile(x.is_immutable) from x
$$;
create function bar() returns void language sql as $$
    with x as (select is_immutable())
    select is_volatile_2(x.is_immutable) from x
$$;

The optimizer will inline them in sql, and be able to factor out the call to is_immutable(), while still keeping the volatile calls in order.

Punchline (if you don't want to read all of Craig's patient explanation)

The optimizer will inline "language sql" procedures, but it won't eliminate "common subexpressions" (or at least not in the way I was hoping). Thus, trying to affect order or execution within one statement, rather than use more than one statement, wouldn't be helpful, at least for what I was trying to accomplish. There is one trick listed below using CTEs that will work -- but as noted by @harmic in the comments, it might not work in future versions of postgres, and could introduce hard-to-find sporadic errors if this happens.... I'm not convinced that it would never be useful to control execution order, but Craig's advice to make sure you have a good reason before trying is certainly good.

The point you haven't addressed at all is *why*? There are well defined methods for executing things in strict order (`DO` blocks, functions, separate SQL statements). Why are you bypassing them then trying to trick the optimiser into doing what you want? You have the keys, why are you trying to hotwire the car? — Craig Ringer, Oct 15 '14 at 03:20
SQL is declarative and not procedural for a reason. While maintaining order among things that have to be ordered, I want to give the optimizer maximal freedom to reorder the rest. In the current case I have two procedures that both have to do an expensive precomputation when called separately. In one context however, I call them one after the other, and am hoping I can get to optimizer to avoid repeating. I could create a temporary table, check if its there... clean it up in the outside context, etc. I was hoping for a "lazy" (as in lazy evaluation) way. — shaunc, Oct 15 '14 at 03:55
You're mixing the declarative and procedural models, which is why it's getting complicated. In the declarative model you'd simply make the precomputation a CTE term and have both branches of the union reference it. Unfortunately (or fortunately in some ways) CTE terms aren't visible from functions invoked from the context of the CTE, so you can't really mix that with your procedural approach. — Craig Ringer, Oct 15 '14 at 04:34
Ok .. hmm... I was thinking that an SQL procedure with a single statement was treated like a macro and would be expanded in the context it was used (and thus not really "procedural"). I think I probably read that about views and then over-generalized :(. Ok ... I guess I'll just have to write a new function to optimize the case where I do both. Thanks for your help. — shaunc, Oct 15 '14 at 05:15
If you're referring to an SQL function (i.e. `LANGUAGE sql`), then yes, they can be inlined into the caller. That's not the case for PL/PgSQL though. However, your assumption that the optimiser can "factor out" the shared calls to `is_immutable` is false; AFAIK it does not do so. — Craig Ringer, Oct 15 '14 at 05:18
Oh -- then does "CTE terms aren't visible from functions invoked from the context of the CTE" apply to LANGUAGE sql? Anyway -- didn't see the 2nd part of your comment -- so indeed won't help me. — shaunc, Oct 15 '14 at 05:21

Craig Ringer · Accepted Answer · 2014-10-15T03:21:00.130

In practice they will be executed in the order given, but there is no guarantee made.

If it's guaranteed, then it'll be covered in the documentation or the SQL standard. I don't see any mention of order-of-execution of a UNION in either.

If the optimiser had a reason to execute one before the other it would be free to do so.

To ensure order of execution, run statements in the desired order:

SELECT * FROM func1();
SELECT * FROM func2();

If you want to reduce round trips use your client's batching facilities if possible, or use a DO block:

DO
$$
BEGIN
  PERFORM proc1();
  PERFORM proc2();
END;
$$;

If you need to return values, use a function and RETURN QUERY or RETURN NEXT.

Or you can force ordering with a CTE, because in PostgreSQL (unfortunately) CTEs act as optimisation fences that force materialisation of results. However, AFAIK PostgreSQL still doesn't have to execute the CTE terms in the order they're written, or the order they're referenced; the only guarantee you get is if you do this:

WITH f1 AS (SELECT * FROM function1())
SELECT * FROM function2()
UNION ALL
SELECT * FROM f1;

then function1 must be executed and materialised first. That's a PostgreSQL-specific misfeature though; it's not true of other database engines, not guaranteed by the standard, and you shouldn't rely on it.

That doesn't extend to

WITH f1 AS (SELECT * FROM function1())
     f2 AS (SELECT * FROM function2())
SELECT * FROM f2
UNION ALL
SELECT * FROM f1;

... as in this case PostgreSQL can execute the independent CTE terms in either order.

Again, with joins, the same principle applies. If terms are independent then the system may choose to run them in any order, though it generally won't. So:

select null::void from (select 1 from foo() ) left join (select 1 from bar()) on true

could evalulate and materialize bar() then join its results on foo().

If you want ordered execution, you shouldn't be relying on set operations like unions and joins. Use separate queries, or procedural code.

is there a more idiomatic way to ensure execution order (with arbitrary/unimportant return values.

Yes, there is.

SELECT * FROM function1();
SELECT * FROM function2();

Is the optimizer free to reorder even if the functions are declared "volatile"? I'm asking about postgres in particular here. If so, how would you insure order? — shaunc, Oct 15 '14 at 03:00
@shaunc A union is a *set* operation. It forms two sets, then takes the union. So yes, it's completely free to execute either set first. In practice it won't, it'll execute them in order because there's no reason to do otherwise, but it doesn't have to. If you want to ensure (not insure) order, you use two separate SQL statements. Or you use a PL/PgSQL procedure that uses `RETURN QUERY`. Or a CTE. — Craig Ringer, Oct 15 '14 at 03:02
thanks (also re spelling :))... are CTEs guaranteed to be executed in order specified, or do I have to specify a nested sequence? Also, could something like `select null::void from (select 1 from foo() ) left join (select 1 from bar()) on true` be reordered? — shaunc, Oct 15 '14 at 03:12
I'm not sure I'd rely on the CTEs. Maybe in a later release they won't prevent optimization, which would result in mysterious seemingly random failures after an upgrade — harmic, Oct 15 '14 at 03:15
'insure' is a word too, it's just a different word - as in 'insurance', rather than 'ensure' which is 'make sure something happens'. Addressed the CTE question in the edit, and the left join. — Craig Ringer, Oct 15 '14 at 03:15
@shaunc All in all, *you're doing it wrong*. If you want ordered execution, ask for ordered execution using the mechanisms that provide it - separate queries, or procedures. Don't try to trick/hack the optimiser into ordering set operations. — Craig Ringer, Oct 15 '14 at 03:19

Does postgres union guarantee order of execution when invoking functions with side effects?

1 Answers1

Linked