0

I do have a large sql table (from the automotive industry) containing information similar to the following format:

   NAME               ID   PARENTID    CHILDID  IN_LAW_ID
   ---------- ---------- ---------- ---------- ----------
   Bill                1          -          -         10
   Faye                2          -          -          -
   Joe                 3          2          1          -
   Billy               4          2          1          -
   Bob                 5          2          1          9
   Catherine           6          7          -          -
   Calvin              7          6          4          -
   Achmed              8          -          -          -
   Rachel              9          -          -          5

(well, the names are in fact spare parts, parents would be predecessor parts, children successor parts and inlaws would be optional parts...)

My goal would be to add an additional column with a unique ID showing which family a NAME belongs to.

  • For example, Achmed has no parent, no child and no inlaws, so he would be tagged as having no family,

  • Bill on the other hand would belong to the same family as Joe and anyone else who has an ID belonging to the same tree (no matter whether it is a PARENTID, CHILDID or IN_LAW_ID relationship).

To complicate things a bit, relationships in the tree can be circular.

  • I.e. Catherine can have Calvin as PARENTID and Calvin can have Catherine as PARENTID.

Ah, and trees can get quite large, with up to 3000 members.

My current approach is to use tools dedicated for Network Mining and extract and name every distinct subnetwork (thus the full subnetwork with no connection to other networks). Yet these tools run on my laptop and take a full week to generate the final list of FAMILYIDs.

I imagine that a simple (or also a rather complex) SQL query will be much more performant - but I have no idea on how to attack the problem in SQL. (I intend to run it on our Microsoft SQL server btw.)

Any help will be much appreciated!

Smandoli
  • 6,919
  • 3
  • 49
  • 83
alec
  • 1
  • This requires recursive CTEs. How large is your netework? Do you need to maintain this for updates and inserts? How many rows are in the table? – Gordon Linoff Apr 24 '19 at 14:19
  • Thanks @GordonLinoff for the quick reply! I'll end up with around 200k networks for a table with 4 Mio rows. Most networks are of sizes between 3-15 spare parts, but there are some of size 3000. Yes, I need to maintain it for updates and inserts: usually I execute my established network mining procedure once every 6 months. – alec Apr 24 '19 at 14:49
  • [This](https://stackoverflow.com/a/42139978/92546) answer demonstrates one technique for detecting and terminating circular loops. During recursion a _path_ of all of the nodes is kept and the recursion is halted if a node is revisited. – HABO Apr 24 '19 at 16:12
  • What version of SQL Server are you using? If using 2017 or newer you might want to consider using the [graph database features](https://learn.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-architecture?view=sql-server-2017) as this is a graph/network problem as you identified. – gomory-chvatal Apr 25 '19 at 16:10
  • Thank you @HABO that's really helpful! – alec Apr 29 '19 at 07:37
  • Thank you @gomory-chvatal! Unfortunately I am still using the 2014 version, but I'll talk to our IT - hopefully an update is within scope! – alec Apr 29 '19 at 07:38

0 Answers0