0

I need to create a breadcrumb field in a child/father table in greenplum MPP

The original table has 2 fields: father, child I need to extract a view with the breadcrumb

For example having this records

1, 2
1, 3
2, 4
4, 5

I need to extract:

1/2
1/3
1/2/4 
1/2/4/5

and so on

Greenplum is an MPP processing engine based on postgresql 8.2. Basically it's a postgres but, giving the version, it does not support the "with recursive" postgresql feature from 8.3.

Furthermore if i try to create a function to do the job i encounter other GP limitations

For example if i create a function like the following


    CREATE OR REPLACE FUNCTION   getBreadCrumb(decimal) RETURNS text AS 'DECLARE
    itemid ALIAS FOR $1;
    itemfullname text;
    itemrecord RECORD;
    BEGIN
        SELECT * INTO itemrecord FROM mytable where father=itemid;
        itemfullname := coalesce(itemfullname, '''') || itemrecord.father::text;
        IF itemrecord.child IS NOT NULL  THEN
           itemfullname := itemfullname || ''/'' || getBreadCrumb(itemrecord.child)::text ;
           RETURN itemfullname;
        ELSE
           RETURN itemfullname;
        END IF;
    END'  LANGUAGE 'plpgsql'

and then try to select from that i get


    ERROR:  function cannot execute on segment because it accesses relation "mytable" (functions.c:152)

which to me seems related to the 'shared nothing' architecture of greenplum.

Any other idea to create the breadcrumb giving the environment i have?

thanks in advance

1 Answers1

1

I had a similar problem working with hierarchies. The only option to solve it is to write a pl/pgsql function executing on the master that would update the "hierarchy" field in a loop like this:

Initial contents:

Node_id | Parent_id | Hierarchy
1       | null      | {1}
2       | 1         | {2}
3       | 1         | {3}
4       | 3         | {4}
5       | 3         | {5}
6       | 5         | {6}

First iteration:

Node_id | Parent_id | Hierarchy
1       | null      | {1}
2       | 1         | {2, 1}
3       | 1         | {3, 1}
4       | 3         | {4, 3}
5       | 3         | {5, 3}
6       | 5         | {6, 5}

Second iteration:

Node_id | Parent_id | Hierarchy
1       | null      | {1}
2       | 1         | {2, 1}
3       | 1         | {3, 1}
4       | 3         | {4, 3, 1}
5       | 3         | {5, 3, 1}
6       | 5         | {6, 5, 3}

Last iteration:

Node_id | Parent_id | Hierarchy
1       | null      | {1}
2       | 1         | {2, 1}
3       | 1         | {3, 1}
4       | 3         | {4, 3, 1}
5       | 3         | {5, 3, 1}
6       | 5         | {6, 5, 3, 1}

One optimization you can make to avoid full self-join on each iteration is to store the additional "level" field and increase it for updated records (because only the records updated on iteration i should be considered for update on iteration i+1)

Unfortunately, GPDB does not support with recursive and executing queries on the segment level, so this is the only option

0x0FFF
  • 4,948
  • 3
  • 20
  • 26
  • I thought of a similar solution but in my case the hierarchy deepness is unknown. Just to let other know i transferred this logic to the oracle original db from which i've extracted the data adding a calculated field with the hierarchy – Stefano Ghezzi Jan 15 '15 at 16:05
  • If deepness is not known you have to make a lot of cycles. But in my case it worked fine even with 500m rows table with 100 levels of hierarchy – 0x0FFF Jan 15 '15 at 16:07