0

any help greatly appreciated. Quite stumped on this one! Amateur at SQL, but think BigQuery isn't helping me out here.

I have a table that looks something like this:

Object  ID1 ID2
A   1   null
B   2   1
C   3   2
D   4   3

A, B, C and D are actually the same object. You can see that they are linked by a "chain" of child IDs. A is the "master" object.

I would like a new column in this table which returns "A" (or equivalent) for linked objects (calling this P1-Obj). Here's what I'd like the output to look like:

P1-Obj  Object  ID1 ID2
A   A   1   null
A   B   2   1
A   C   3   2
A   D   4   3
E   E   5   null
E   F   6   5
E   G   7   6
E   H   8   7

This would have to work with "chains" of arbitrary length (i.e. could have 0-100 objects connected to P1-Obj).

I solved this in a spreadsheet using this formula - imagining that the above table had columns A-D, this formula works when put in column A: =IF(ISBLANK($D2),$B2,INDIRECT("A"&MATCH($D2,$C:$C,0))). I need the SQL equivalent of this!

So far, I tried to use a CTE with recursion to get me started on a data structure I could then work with to get to the above.

Something like:

WITH cte as (
    SELECT *
    FROM table
    where ID2 is null
    UNION ALL
    SELECT m.*
    FROM table m 
    JOIN cte o
    on m.ID1 = o.ID2) 
select * 
from cte

But BigQuery spits back out: "Table name "table" missing dataset while no default dataset is set in the request" - and from a quick google it looks recursive CTE's are not supported on BigQuery.

I'm not even sure if a recursive CTE is the right thing that I'm looking for to start solving my problem, but I haven't been able to even get an output to see - and every other google avenue has taken me to a dead end.

I'm pretty stumped here as to if there's another way to do this in BigQuery (or with SQL in general)!

Thanks so much in advance for any advice.

EDIT: Added fiddle: https://www.db-fiddle.com/f/m6EN38upDhFLt7eUwtKrLa/3

Not sure if that's doing what I need it to...

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
nakedlunch
  • 11
  • 4
  • BigQuery does not support recursive CTEs nor any other hierarchical approaches that I know of. I would suggest that you find another way to store the hierarchy information. – Gordon Linoff Jul 09 '19 at 17:57
  • BTW: `on m.ID1 = o.ID2` <--you have `o` and `m` swapped (o.ID2 will allways be NULL, so you will only get the *starters*) – wildplasser Jul 09 '19 at 18:16
  • @wildplasser Thank you! Knew there was something going on there. Still none the wiser as the result (on a fiddle, not bigquery) is just spitting the table back at me i.e.: `| PN | ID1 | ID2 | | --- | --- | --- | | A | 1 | | | E | 4 | | | B | 2 | 1 | | F | 5 | 4 | | C | 3 | 2 | | G | 6 | 5 |` `Fiddle here: https://www.db-fiddle.com/f/m6EN38upDhFLt7eUwtKrLa/3 – nakedlunch Jul 09 '19 at 18:22
  • @nakedlunch - in example you provided - there is a clear pattern in how ids are numbered - most likely this was a simplification, but if this is not and there is a pattern in ids to be consecutive for the children of the same object - that would eliminate need in recursive logic. please clarify – Mikhail Berlyant Jul 09 '19 at 19:57
  • @MikhailBerlyant Hi there! I'm afraid not, the real IDs are random hashes, not sequential numbers. Some extra context: I have solved this in a spreadsheet. In the spreadsheet, I have four columns as in the second example above (imagine them as columns A-D) My formula is: =IF(ISBLANK($D2),$B2,INDIRECT("A"&MATCH($D2,$C:$C,0))). Unfortunately, I need the equivalent in SQL :( – nakedlunch Jul 09 '19 at 22:17
  • + If there's a generic way to do it in SQL, BigQuery question aside, I would love to know about this for the future! – nakedlunch Jul 09 '19 at 22:27
  • BTW: it helps to use descriptive column names (such as `parent_id` instead of `ID2`) – wildplasser Jul 09 '19 at 22:42
  • @nakedlunch - just curious - how big the initial table - how many rows are there? – Mikhail Berlyant Jul 10 '19 at 00:30
  • @MikhailBerlyant There are about 14,000 rows, and growing every day – nakedlunch Jul 10 '19 at 07:11

2 Answers2

2

How's this?

WITH RECURSIVE table2 (root_pn, pn, id1, id2) AS
(
  SELECT pn as root_pn, pn, id1, id2
    FROM table1
    WHERE id2 IS NULL
  UNION ALL
  SELECT root_pn, c.pn, c.id1, c.id2
    FROM table2 AS cp JOIN table1 AS c
      ON cp.id1 = c.id2
)
SELECT * FROM table2
ORDER BY root_pn;
DominicEU
  • 3,585
  • 2
  • 21
  • 32
0

Level closures might do the job when you're handling hierarchies. Can have a quick read into them here

Example fiddle here: https://www.db-fiddle.com/f/igzuLfJzsh7nmr2r3w5L5W/5

The jist is you create a closure table which contains all ancestors and descendants and then use that to query against like so;

SELECT t2.pn as `p1-obj`, t1.pn, t1.id, t1.ancestor
FROM t1
JOIN t1_closure t1c ON t1.id = t1c.descendant
JOIN t1 t2 ON t1c.ancestor = t2.id
WHERE t1c.ancestor IN (1, 5);

The above goes on the basis you know the root node to query against. You could add an additional subquery in to calculate what the root nodes are, i.e.

WHERE t1c.ancestor IN (
    # Sub Query here
);
Mikey
  • 2,606
  • 1
  • 12
  • 20