0

I need to optimize the performance of a commom WITH RECURSIVE query... We can limit the depth of the tree and decompose in many updates, and can also change representation (use array)... I try some options but perhaps there are a "classic optimization solution" that I'm not realizing.

All details

There are a t_up table, to be updated, with a composit primary key (pk1,pk2), one attribute attr and an array of references to primary keys... And a unnested representation t_scan, with the references; like this:

pk1 | pk2 | attr | ref_pk1 | ref_pk2
n   | 123 | 1    |         |
n   | 456 | 2    |         |
r   | 123 | 1    | w       | 123
w   | 123 | 5    | n       | 456
r   | 456 | 2    | n       | 123
r   | 123 | 1    | n       | 111
n   | 111 | 4    |         |
... |  ...| ...  | ...     | ...

There are no loops.

UPDATE t_up SET x = pairs
FROM (
    WITH RECURSIVE tree as (
      SELECT pk1, pk2, attr, ref_pk1, ref_pk2,
             array[array[0,0]]::bigint[] as all_refs
      FROM t_scan
      UNION ALL
      SELECT c.pk1, c.pk2, c.attr, c.ref_pk1, c.ref_pk2
             ,p.all_refs || array[c.attr,c.pk2]
      FROM t_scan c JOIN tree p
        ON  c.ref_pk1=p.pk1 AND c.ref_pk2=p.pk2 AND c.pk2!=p.pk2
           AND array_length(p.all_refs,1)<5  -- 5 or 6 avoiding endless loops  
    )
    SELECT pk1, pk2, array_agg_cat(all_refs) as pairs
    FROM (
      SELECT distinct pk1, pk2, all_refs
      FROM tree
      WHERE array_length(all_refs,1)>1 -- ignores initial array[0,0].
    ) t
    GROUP BY 1,2
    ORDER BY 1,2
) rec
WHERE rec.pk1=t_up.pk1 AND rec.pk2=t_up.pk2
;

To test:

CREATE TABLE t_scan(
      pk1 char,pk2 bigint, attr bigint,
      ref_pk1 char, ref_pk2 bigint
);
INSERT INTO t_scan VALUES
('n',123, 1 ,NULL,NULL),
('n',456, 2 ,NULL,NULL),
('r',123, 1 ,'w'    ,123),
('w',123, 5 ,'n'    ,456),
('r',456, 2 ,'n'    ,123),
('r',123, 1 ,'n'    ,111),
('n',111, 4 ,NULL,NULL);

Running only rec you will obtain:

 pk1 | pk2 |      pairs      
-----+-----+-----------------
 r   | 123 | {{0,0},{1,123}}
 r   | 456 | {{0,0},{2,456}}
 w   | 123 | {{0,0},{5,123}}

But, unfortunately, to appreciate the "Big Data performance problem", you need to see it in a real database... I am preparing a public Github that run with OpenStreetMap Big Data.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • Perhaps some clues [here](https://stackoverflow.com/a/5928111/287948) – Peter Krauss Aug 23 '18 at 09:35
  • Please **[edit]** your question and add the `create table` statements for the tables in question (including all indexes), the query you are using and the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers)`**. [**Formatted text**](http://stackoverflow.com/help/formatting) please, [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) –  Aug 23 '18 at 10:04
  • To arrive at a big data solution, it'd help to know more about the "real" table-- like indexes, etc... that stuff will play a role in this query – Joe Love Aug 23 '18 at 18:26

0 Answers0