1

I wonder if anybody can please confirm the expected behavior of PostgreSQL Inlining?

From the Microsoft SQL world, any function which is determined to be inline, will be used in a single iteration over a number of lines (The function body is essentially injected into the caller statement, making it set-based [One call] instead of a probe per input row of data [Many calls]).

My team and I were battling to prove this without a Profiler such as in MSSQL, but we were able to prove that eventually and found that the number of iterations on our function we think is inline is directly proportional to the number of rows over which it operates.

We did so by introducing an intentional wait in the function (pg_sleep) where we can see that a wait of N seconds leads to a total execution time of Rows*N, i.e. over an input of 6 rows, waiting 1 second is 6 seconds, waiting 2 is 12 and so on.

So our questions are:

  1. Is inlining in PostgreSQL what we think it is (Equivalent to MSSQLs Inline function [Type='IF'])?
  2. Is there a profiling tool which is able to show this clearly, like Profiler in MSSQL is able to?
  3. Are there any meta-data markers we can look at to confirm/deny that our function is indeed inline-able?

1 Answers1

10

The term "inlining" has a different meaning in Postgres. That usually refers to language sql functions which are completely replaced by the contained query when used inside another query, e.g. this function

create or replace function customers_in_zip(p_zip_code varchar(5))
  returns setof customers
as
$$
  select *
  from customers
  where zip_code = p_zip_code;
$$
language sql;

used like this:

select *
from orders o 
  join customers_in_zip('42') c on o.customer_id = c.id;

would be expanded by the optimizer to:

select *
from orders o 
  join customers c on o.customer_id = c.id and c.zip_code = '42';

That type of inlining can be seen when generating the execution plan using explain (analyze). For this to work the function has to be marked as immutable or stable

E.g. if the function can be "inlined", the plan looks something like this:

Nested Loop  (cost=2.39..200.79 rows=79 width=52) (actual time=0.021..0.165 rows=115 loops=1)
  ->  Bitmap Heap Scan on public.customers  (cost=1.97..20.71 rows=13 width=28) (actual time=0.014..0.023 rows=15 loops=1)
        Recheck Cond: ((customers.zip_code)::text = '80807'::text)
        ->  Bitmap Index Scan on customers_zip_code_idx  (cost=0.00..1.96 rows=13 width=0) (actual time=0.010..0.010 rows=15 loops=1)
              Index Cond: ((customers.zip_code)::text = '80807'::text)
  ->  Index Scan using idx_orders_cust_id on public.orders o  (cost=0.42..13.84 rows=8 width=24) (actual time=0.003..0.008 rows=8 loops=15)
        Index Cond: (o.customer_id = customers.id)

As you can see, there is no reference to the function (the plan for the query without the function looks pretty much the same).

If the function is not inlined (e.g. because it was not declared stable or because it's a PL/pgSQL function rather than a SQL function), the plan would look something like this:

Nested Loop  (cost=0.68..139.94 rows=77 width=110) (actual time=0.710..0.862 rows=115 loops=1)
  ->  Function Scan on public.customers_in_zip c  (cost=0.25..0.26 rows=10 width=86) (actual time=0.696..0.697 rows=15 loops=1)
        Function Call: customers_in_zip('42'::character varying)
        Buffers: shared hit=18
  ->  Index Scan using idx_orders_cust_id on public.orders o  (cost=0.42..13.96 rows=8 width=24) (actual time=0.004..0.009 rows=8 loops=15)
        Output: o.id, o.customer_id, o.order_date, o.amount, o.sales_person_id
        Index Cond: (o.customer_id = c.id)

From your description, it seems you are not referring to that kind of "inlining", but rather whether a scalar function is only called once if it does not depend on values take from the row, e.g.:

select col1, some_function(), col2
from some_table;

If some_function() is declared immutable it will only be called once.

Quote from the manual

IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; [...] If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

That is not something you can see directly in the execution plan, but the following will demonstrate it:

create function expensive_scalar(p_some_input integer)
  returns integer
as 
$$
begin
  perform pg_sleep(10);
  return p_some_input * 2;
end;  
$$
language plpgsql
IMMUTABLE;

The perform pg_sleep(10); makes the function take 10 seconds to execution. The following query will call that function a hundred times:

select i, expensive_scalar(2)
from generate_series(1,100) i;

But the execution is just a little bit over 10 seconds, which clearly shows the function is only called once.


As far as I know, Postgres will also cache results for functions marked as stable during the execution of a single statement (for the same input values).

That is a bit harder to show though. Typically you can do that by putting raise notice statements (Postgres' equivalent to print) into the function and see how often they are printed.

  • Our scenario calls instead for a table function (Recursive CTE work going on in there), one which also accesses user data. The answer would still be deterministic and hopefully still immutable; given the underlying data not changing. That does offer an interesting extension to it. We really appreciate your guidance and will simulate and revert! – Dylan Westphal Oct 29 '18 at 08:25
  • 1
    A function that accesses data from the database can never be immutable - it can only be stable. If you hope for "inlining" you have to use `language sql` functions. PL/pgSQL functions are always treated as a block box –  Oct 29 '18 at 08:32