2

I am working with data that is represented using the following basic syntax:

a→b→c

EDIT

This is used to describe a spatiotemporal relation of alteration assemblages around an ore deposit. An alteration assemblage can either be spatially or temporally related to the mineralisation of a deposit, which could be constructed as such

(a[proximal]→b→c[distal])[early]→(d[proximal]→e→f[distal])[late]

Both the groups a, b, c and d, e, f are spatial relations to the ore deposit, and these two groups can be temporally related to each other. These can get more complex, in full they often look like this:

(ksp-cal-cpx±anr [early] → cpx-act-scp-cal-mt±adr±tm±all±ap±bt±ksp → chl-ep-cal-cp-py-mt [late]) [proximal] → cpx±amp±scp → qtz-hm-ab-ser-cal±scp±py±po±fl [distal]

(Note: the 'proximal', 'distal', 'early', and 'late' descriptors in square brackets are often, but not always, present. The syntax like 'ksp-cal-cpx±anr' is a mineral assemblage, which I have abstracted away with a, b, c, etc in my examples)

In my table I represent a→b→c in the following way

ID TERM PARENT GROUP_LEVEL
1 a 1
2 b 1 1
3 c 2 1

Sometimes these data can have nested groups, eg

a→(b→c)

I indicate the change of level of these groups using the fourth column GROUP_LEVEL

ID TERM PARENT GROUP_LEVEL
4 a 1
5 b 4 2
6 c 5 2

The nesting can potentially get more complex, eg

a→((b→c)→d→(e→f))

The table is then structured as follows:

ID TERM PARENT GROUP_LEVEL
7 a 1
8 b 7 3
9 c 8 3
10 d 8 2
11 e 10 3
12 f 11 3

I've tried using the following SQL, but cannot get it to work

with hierarchical_terms as
(select 
CONNECT_BY_ROOT id as id,
level as term_level,
term,
group_level,
connect_by_isleaf leaf_node,
row_number()
               over (PARTITION BY CONNECT_BY_ROOT id, group_level ORDER BY term nulls last) as group_order
from test_groupings tg

connect by prior tg.id =  tg.parent start with parent is null

)
select id,
listagg (
    case when term_level =1 then 
        term
    when term_level > 1 then

        case when group_level > 1 and group_order = 1 then
            '→('||term
   
        when  group_level > 1 and leaf_node = 1 then
           '→'|| term || ')'
        else 
            '→'||term
        end
    end, 
 '')   as terms 
    
    from hierarchical_terms
group by id

However this doesn't quite work, row in bold is not formatted the way I like it.

ID TERMS
1 a→b→c
4 a→(b→c)
7 a→(d→(b→c)→e→f)

What I want is for each 'group level' to be encapsulate by parentheses, and then connected by their hierarchical relations. But I can't seem to achieve this no matter what I try.

Any help would be greatly appreciated.

Michael
  • 47
  • 5
  • 2
    Not sure what are the rules to derive a table content from the brackets notation first. So the rules of a reverse query are also not clear. – Serg Aug 22 '22 at 07:56
  • 1
    Is 10 parent is really 8 and not 9? – David דודו Markovitz Aug 22 '22 at 13:56
  • 1
    Is there really an hierarchy here or just an order? – David דודו Markovitz Aug 22 '22 at 13:58
  • Hi @Serg I have added some detail on how the rules are meant to work. There are no real formal rules as such, I have had to infer a basic set of rules from how the geologists represent this data in free text (which is not always consistent). – Michael Aug 22 '22 at 23:57
  • @DavidדודוMarkovitz there is a hierarchy, I've attempted an explanation above – Michael Aug 23 '22 at 04:23
  • The geological info adds little to the problem formalism. Still can't see how `a→((b→c)→d→(e→f))` means `b is a parent of d` for example. And if the fact `b is a parent of g` is added to the story what would be its possible bracket notations? – Serg Aug 23 '22 at 19:06

1 Answers1

1

You may not be able to do this in SQL, but you can do it in PL/SQL:

Create the types:

CREATE TYPE node IS OBJECT(
  group_level NUMBER(5,0),

  NOT FINAL NOT INSTANTIABLE MEMBER PROCEDURE add_child(
    self        IN OUT NOCOPY node,
    id          NUMBER,
    term        VARCHAR2,
    parent      NUMBER,
    group_level NUMBER
  ),

  NOT FINAL NOT INSTANTIABLE MEMBER FUNCTION to_char(self node) RETURN VARCHAR2,
  NOT FINAL NOT INSTANTIABLE MEMBER FUNCTION first_id(self node) RETURN NUMBER,
  NOT FINAL NOT INSTANTIABLE MEMBER FUNCTION last_id(self node) RETURN NUMBER
) NOT FINAL NOT INSTANTIABLE;

CREATE TYPE node_list IS TABLE OF NODE;

CREATE TYPE branch_node UNDER node (
  nodes node_list,
  CONSTRUCTOR FUNCTION branch_node(
    self        IN OUT NOCOPY branch_node,
    group_level NUMBER
  ) RETURN SELF AS RESULT,
  
  OVERRIDING MEMBER PROCEDURE add_child(
    self        IN OUT NOCOPY branch_node,
    id          NUMBER,
    term        VARCHAR2,
    parent      NUMBER,
    group_level NUMBER
  ),

  OVERRIDING MEMBER FUNCTION to_char(self branch_node) RETURN VARCHAR2,
  OVERRIDING MEMBER FUNCTION first_id(self branch_node) RETURN NUMBER,
  OVERRIDING MEMBER FUNCTION last_id(self branch_node) RETURN NUMBER
);

CREATE TYPE leaf_node UNDER node (
  id     NUMBER(5,0),
  term   VARCHAR2(5),
  parent NUMBER(5,0),

  OVERRIDING MEMBER PROCEDURE add_child(
    self        IN OUT NOCOPY leaf_node,
    id          NUMBER,
    term        VARCHAR2,
    parent      NUMBER,
    group_level NUMBER
  ),

  OVERRIDING MEMBER FUNCTION to_char(self leaf_node) RETURN VARCHAR2,
  OVERRIDING MEMBER FUNCTION first_id(self leaf_node) RETURN NUMBER,
  OVERRIDING MEMBER FUNCTION last_id(self leaf_node) RETURN NUMBER
);

CREATE TYPE term_details IS OBJECT (
  id   NUMBER(5,0),
  term VARCHAR2(4000)
);

CREATE TYPE term_details_list IS TABLE OF term_details;

With the bodies:

CREATE TYPE BODY branch_node IS
  CONSTRUCTOR FUNCTION branch_node(
    self IN OUT NOCOPY branch_node,
    group_level NUMBER
  ) RETURN SELF AS RESULT
  IS
  BEGIN
    self.group_level := group_level;
    self.nodes := node_list();
    RETURN;
  END branch_node;

  OVERRIDING MEMBER FUNCTION to_char(self branch_node) RETURN VARCHAR2
  IS
    terms VARCHAR2(4000);
  BEGIN
    IF self.nodes IS EMPTY THEN
      RETURN '()';
    END IF;
    terms := '(' || self.nodes(1).to_char();
    FOR i IN 2 .. self.nodes.COUNT LOOP
      terms := terms || '->' || self.nodes(i).to_char();
    END LOOP;
    terms := terms || ')';
    RETURN terms;
  END to_char;

  OVERRIDING MEMBER FUNCTION first_id(self branch_node) RETURN NUMBER
  IS
  BEGIN
    IF self.nodes IS EMPTY THEN
      RETURN NULL;
    ELSE
      RETURN self.nodes(1).first_id();
    END IF;
  END first_id;

  OVERRIDING MEMBER FUNCTION last_id(self branch_node) RETURN NUMBER
  IS
  BEGIN
    IF self.nodes IS EMPTY THEN
      RETURN NULL;
    ELSE
      RETURN self.nodes(self.nodes.COUNT).last_id();
    END IF;
  END last_id;
  
  OVERRIDING MEMBER PROCEDURE add_child(
    self        IN OUT NOCOPY branch_node,
    id          NUMBER,
    term        VARCHAR2,
    parent      NUMBER,
    group_level NUMBER
  )
  IS
    debug CONSTANT BOOLEAN := FALSE;
  BEGIN
    IF     self.nodes IS NOT EMPTY
       AND group_level > self.group_level
       AND self.nodes(self.nodes.COUNT) IS OF (branch_node)
       AND parent BETWEEN self.nodes(self.nodes.COUNT).first_id()
                      AND self.nodes(self.nodes.COUNT).last_id()
    THEN
      IF debug THEN
      DBMS_OUTPUT.PUT_LINE(
        'Add to last branch: ' || self.group_level || ' -> ' || self.nodes.COUNT
        || ' (' || id || ', ' || term || ', ' || parent || ', ' || group_level || ')'
      );
      END IF;
      self.nodes(self.nodes.COUNT).add_child(id, term, parent, group_level);
      RETURN;
    END IF;
    self.nodes.EXTEND;
    IF group_level = self.group_level THEN
      IF debug THEN
      DBMS_OUTPUT.PUT_LINE(
        'Add new leaf: ' || self.group_level || ' -> ' || self.nodes.COUNT
        || ' (' || id || ', ' || term || ', ' || parent || ', ' || group_level || ')'
      );
      END IF;
      self.nodes(self.nodes.COUNT) := leaf_node(group_level, id, term, parent);
    ELSE
      IF debug THEN
      DBMS_OUTPUT.PUT_LINE(
        'Add new branch: ' || self.group_level || ' -> ' || self.nodes.COUNT
        || ' (' || id || ', ' || term || ', ' || parent || ', ' || group_level || ')'
      );
      END IF;
      self.nodes(self.nodes.COUNT) := branch_node(self.group_level + 1);
      self.nodes(self.nodes.COUNT).add_child(id, term, parent, group_level);
    END IF;
  END add_child;
END;
/

CREATE TYPE BODY leaf_node IS
  OVERRIDING MEMBER PROCEDURE add_child(
    self        IN OUT NOCOPY leaf_node,
    id          NUMBER,
    term        VARCHAR2,
    parent      NUMBER,
    group_level NUMBER
  )
  IS
  BEGIN
    RAISE_APPLICATION_ERROR(-20000, 'Not valid for a leaf node.');
  END add_child;

  OVERRIDING MEMBER FUNCTION to_char(
    self IN leaf_node
  ) RETURN VARCHAR2
  IS
    terms VARCHAR2(4000);
  BEGIN
    RETURN self.term;
  END to_char;

  OVERRIDING MEMBER FUNCTION first_id(
    self IN leaf_node
  ) RETURN NUMBER
  IS
  BEGIN
    RETURN self.id;
  END first_id;

  OVERRIDING MEMBER FUNCTION last_id(
    self IN leaf_node
  ) RETURN NUMBER
  IS
  BEGIN
    RETURN self.id;
  END last_id;
END;
/

Then you can create a pipelined function:

CREATE FUNCTION get_hierarchy
  RETURN term_details_list PIPELINED
IS
  v_node branch_node;
  debug CONSTANT BOOLEAN := FALSE;
BEGIN
  FOR r IN (
    SELECT *
    FROM   table_name t
    START WITH parent IS NULL
    CONNECT BY PRIOR id = parent
    ORDER SIBLINGS BY id
  )
  LOOP
    IF r.parent IS NULL THEN
      IF v_node IS NOT NULL THEN
        PIPE ROW (term_details(v_node.first_id, v_node.to_char()));
      END IF;
      IF debug THEN
        DBMS_OUTPUT.PUT_LINE('------');
      END IF;
      v_node := branch_node(1);
    END IF;
    v_node.add_child(r.id, r.term, r.parent, r.group_level);
  END LOOP;

  IF v_node IS NOT NULL THEN
    PIPE ROW (term_details(v_node.first_id, v_node.to_char()));
  END IF;
EXCEPTION
  WHEN NO_DATA_NEEDED THEN
    NULL;
END;
/

Then the output from the function:

SELECT *
FROM   TABLE(get_hierarchy());

For the sample data:

CREATE TABLE table_name (ID, TERM, PARENT, GROUP_LEVEL) AS
SELECT  1, 'a', NULL, 1 FROM DUAL UNION ALL
SELECT  2, 'b',    1, 1 FROM DUAL UNION ALL
SELECT  3, 'c',    2, 1 FROM DUAL UNION ALL
SELECT  4, 'a', NULL, 1 FROM DUAL UNION ALL
SELECT  5, 'b',    4, 2 FROM DUAL UNION ALL
SELECT  6, 'c',    5, 2 FROM DUAL UNION ALL
SELECT  7, 'a', NULL, 1 FROM DUAL UNION ALL
SELECT  8, 'b',    7, 3 FROM DUAL UNION ALL
SELECT  9, 'c',    8, 3 FROM DUAL UNION ALL
SELECT 10, 'd',    8, 2 FROM DUAL UNION ALL
SELECT 11, 'e',   10, 3 FROM DUAL UNION ALL
SELECT 12, 'f',   11, 3 FROM DUAL UNION ALL
SELECT 13, 'a', NULL, 1 FROM DUAL UNION ALL
SELECT 14, 'b',   13, 2 FROM DUAL UNION ALL
SELECT 15, 'c',   13, 2 FROM DUAL UNION ALL
SELECT 16, 'a', NULL, 1 FROM DUAL UNION ALL
SELECT 17, 'b',   16, 3 FROM DUAL UNION ALL
SELECT 18, 'c',   16, 3 FROM DUAL;

Is:

ID TERM
1 (a->b->c)
4 (a->(b->c))
7 (a->((b->c)->d->(e->f)))
13 (a->(b)->(c))
16 (a->((b))->((c)))

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117