4

The function is stuck just before the loop.

select * from scm_main.fn_connection_stations(1219646)

The message "Start..." is printed but message "... end" will not print.

CREATE OR REPLACE FUNCTION     
scm_main.fn_connection_stations(var_connection_id bigint)
RETURNS SETOF scm_main.typ_connection_stations AS
$BODY$ DECLARE 
    var_affected                 INTEGER DEFAULT 0; 
    var_row                      scm_main.typ_connection_stations%ROWTYPE; 
BEGIN 
    RAISE NOTICE 'Start...'; 
    FOR var_row IN 
    SELECT DISTINCT v.vbvdata_station_id
    FROM scm_main.tbl_vbvdata AS v 
    INNER JOIN scm_main.tbl_packet AS p ON (v.vbvdata_packet_id = 
    p.packet_id and p.packet_connection_id = var_connection_id) 
    --WHERE v.vbvdata_packet_id IN 
    --( SELECT packet_id FROM scm_main.tbl_packet AS o_p WHERE 
    o_p.packet_connection_id = var_connection_id) 
    LOOP 
        RETURN NEXT var_row; 
    END LOOP; 
    RAISE NOTICE '...End'; 
    RETURN; 
END 
$BODY$
  LANGUAGE plpgsql STABLE STRICT
  COST 100
  ROWS 1000;
ALTER FUNCTION scm_main.fn_connection_stations(bigint)
  OWNER TO postgres;

The query itself is very simple and runs when calling directly like this:

SELECT DISTINCT v.vbvdata_station_id 
FROM scm_main.tbl_vbvdata AS v 
INNER JOIN scm_main.tbl_packet AS p ON (v.vbvdata_packet_id =  
p.packet_id and p.packet_connection_id = 1219646) 

I guess the problem must be lockibg of tables. But I absolutely don't understand what is the reason and what can be the solution.

Some time ago the commented part was working. But after a while the same problem was arised. I solved it by changing the query and replacing the condition (cascading query) with the inner join. But This time neither of them work!


Updated

I made it work again with a stupid change:

INNER JOIN scm_main.tbl_packet AS p ON (v.vbvdata_packet_id = p.packet_id 
  and p.packet_connection_id = var_connection_id)`

was changed to:

INNER JOIN scm_main.tbl_packet AS p ON (v.vbvdata_packet_id = p.packet_id)
  where p.packet_connection_id in (select var_connection_id)

and worked perfectly.

Another interesting point is that even the below change didn't work and still hangs:

INNER JOIN scm_main.tbl_packet AS p ON (v.vbvdata_packet_id = p.packet_id) 
  where p.packet_connection_id = var_connection_id

Now I found a solution. But I want to know how it happening like this!

mehrdad seyrafi
  • 3,084
  • 2
  • 19
  • 16
  • what you see in pg_locks where not granted?.. I mean start the f() and check what is locked – Vao Tsun May 18 '15 at 08:37
  • What is the definition of `scm_main.typ_connection_stations`? Please post the table or type definition. – Patrick May 18 '15 at 08:54
  • Dear Voa no lock with granted=false while the function is running – mehrdad seyrafi May 18 '15 at 09:39
  • It could be [caching a bad execution plan](http://www.postgresql.org/docs/9.3/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING). You might improve your chances by re-analysing the table, maybe with a higher statistics target (see `ALTER TABLE ... SET STATISTICS`). You can avoid the cache by running your query through `EXECUTE`. – Nick Barnes May 18 '15 at 10:18
  • @NickBarnes change your comment to Answer. I will flag it as accepted. Reanalyze solved the problem – mehrdad seyrafi May 18 '15 at 10:42

1 Answers1

1

If it was fixed by such a superficial change to the query, then it was probably not "stuck", but just running very slowly due to a poorly chosen execution plan.

The fact that the query performs worse in a function than it does on its own is probably the result of plan caching. Basically, Postgres may try to avoid the cost of re-planning the query by creating and reusing a generic (i.e. parameter-independent) execution plan. Unfortunately, these generic plans can be far from optimal.

A likely explanation for a bad plan is bad statistics; if Postgres doesn't have accurate information about the data, it's bound to make poor decisions. An ANALYSE of the tables involved might help, but this should not usually be necessary - autovacuum should generally keep the stats fairly up-to-date (assuming you have it running).

One common cause of skewed statistics is an uneven distribution of values (in your case, if you have big variations in the number of packets per connection). This might be improved a lot by increasing the level of detail in the statistics, using ALTER TABLE ... SET STATISTICS (followed by an ANALYSE). High values can slow down planning in some cases, but 500 is (probably) a safe starting point.

If nothing else works, you can always bypass the plan cache by running your query through EXECUTE.

Nick Barnes
  • 19,816
  • 3
  • 51
  • 63