I do have a large sql table (from the automotive industry) containing information similar to the following format:
NAME ID PARENTID CHILDID IN_LAW_ID
---------- ---------- ---------- ---------- ----------
Bill 1 - - 10
Faye 2 - - -
Joe 3 2 1 -
Billy 4 2 1 -
Bob 5 2 1 9
Catherine 6 7 - -
Calvin 7 6 4 -
Achmed 8 - - -
Rachel 9 - - 5
(well, the names are in fact spare parts, parents would be predecessor parts, children successor parts and inlaws would be optional parts...)
My goal would be to add an additional column with a unique ID showing which family a NAME
belongs to.
For example, Achmed has no parent, no child and no inlaws, so he would be tagged as having no family,
Bill on the other hand would belong to the same family as Joe and anyone else who has an ID belonging to the same tree (no matter whether it is a
PARENTID
,CHILDID
orIN_LAW_ID
relationship).
To complicate things a bit, relationships in the tree can be circular.
- I.e. Catherine can have Calvin as
PARENTID
and Calvin can have Catherine asPARENTID
.
Ah, and trees can get quite large, with up to 3000 members.
My current approach is to use tools dedicated for Network Mining and extract and name every distinct subnetwork (thus the full subnetwork with no connection to other networks). Yet these tools run on my laptop and take a full week to generate the final list of FAMILYID
s.
I imagine that a simple (or also a rather complex) SQL query will be much more performant - but I have no idea on how to attack the problem in SQL. (I intend to run it on our Microsoft SQL server btw.)
Any help will be much appreciated!