4

I have a table, Models that consists of these (relevant) attributes:

-- -----------------------------------------------------
-- Table `someDB`.`Models`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `someDB`.`Models` (
  `model_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
  `type_id` SMALLINT UNSIGNED NOT NULL,
  -- someOtherAttributes
  PRIMARY KEY (`model_id`),
ENGINE = InnoDB;


+---------+---------+
| model_id| type_id |
+---------+---------+
| 1       | 4       | 
| 2       | 4       | 
| 3       | 5       | 
| 4       | 3       | 
+---------+---------+

And table Model_Hierarchy that shows the parent & child relationship (again, showing only the relevant attributes):

-- -----------------------------------------------------
-- Table `someDB`.`Model_Hierarchy`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `someDB`.`Model_Hierarchy` (
  `parent_id` MEDIUMINT UNSIGNED NOT NULL,
  `child_id` MEDIUMINT UNSIGNED NOT NULL,
  -- someOtherAttributes,

  INDEX `fk_Model_Hierarchy_Models1_idx` (`parent_id` ASC),
  INDEX `fk_Model_Hierarchy_Models2_idx` (`child_id` ASC),
  PRIMARY KEY (`parent_id`, `child_id`),
  CONSTRAINT `fk_Model_Hierarchy_Models1`
    FOREIGN KEY (`parent_id`)
    REFERENCES `someDB`.`Models` (`model_id`)
    ON DELETE CASCADE
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Model_Hierarchy_Models2`
    FOREIGN KEY (`child_id`)
    REFERENCES `someDB`.`Models` (`model_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


+-----------+----------+
| parent_id | child_id |
+-----------+----------+
| 1         | 2        | 
| 2         | 4        | 
| 3         | 4        | 
+-----------+----------+

If there is a Model that is not a parent or child (at some point) of another Model whose type is 5, it is not valid and hence should be deleted.

This means that Model 1, 2 should be deleted because at no point do they have a model as parent or child with type_id = 5.

There are N levels in this hierarchy, but there are no circular relationship (ie. 1 -> 2; 2 -> 1 will not exist).

Any idea on how to do this?

emihir0
  • 1,200
  • 3
  • 16
  • 39
  • You should switch the names parents and children (in a tree, the nodes that have several childnodes are called parents, not vice versa), and you would actually not need a seperate table for that. I assume you want to call this cleanup at a chosen point (not with every change in the data)? If you are open to change your tree to a nested set or a closure tree, this can be done in a single query; with your current treetype, I don't think it can be done without a procedure. My first idea would be to add a mark-column (or use a temp table) and mark all good nodes during (n) loops. – Solarflare Jun 28 '16 at 17:59
  • Please provide `show create table xyz` for both. Btw finally a decent question in the last hour. – Drew Jun 28 '16 at 18:00
  • @Solarflare both `model_id 2` and `3` contain `model_id 4` as a child. A model can have multiple parents (ie. belonging to more than 1 model) while also having multiple children. – emihir0 Jun 28 '16 at 18:50
  • 1
    @Drew I added create statements for the tables now. If anything is not clear, please let me know. – emihir0 Jun 28 '16 at 19:01
  • If `a->b` means`a` is the parent of` `b`, what is keeping `c->b` also? To me it should be a self-joining hierarchy with a PK on `id` and a column `parent_id`. In any case, is an alter table allowed to add 2 columns on `Model_Hierarchy`, and how many rows in each table, and what max depth? – Drew Jun 28 '16 at 19:56
  • @Drew imagine `b` being a product and `a` and `c` being two different ways to package it. Hence `b` is contained in both `a` and `c` respectively and separately (ie. customer can buy both `a` and `c`), however, any change (outside of the scope of this question) to `b` is reflected to both `a` and `c`. `Model_Hierarchy can be altered, yes (the project is in an early development). Max tree depth is unknown (this is for manufacturing... so a component can consist of a component... that consist of further component etc. no real limit). How many rows? Depends, but I don't expect it to exceed 2^32. – emihir0 Jun 28 '16 at 20:28
  • I find no connection between your two tables. Either an oversight, or my blindness. Let's say it is there (though I still protest and say a non self-join just complicates things). But if there exists such a model / node that is not part of a status 5 then it gets banished as well as its whole branch hierarchy due to `that is not a parent or child (at some point) of another Model whose type is 5` ... it also seems that you are not asking for a function giving an `id` to the developer here, but rather asking them to hunt your whole hierarchy. Lastly, what have you tried so far? – Drew Jun 28 '16 at 20:37
  • @Drew `parent_id` and `child_id` both refer to `model_id` from table `Models` - they are the FKs (and the tuple combination is the PK of `Model_Hierarchy`). `type_id` is not shown in `Model_Hierarchy` in any way. To put it into a better perspective, `type_id 5` refers to a 'real sellable product', hence if a `model` does not contain any sellable product (ie. is a packaging) OR the parent of the `model` at some point is not a sellable product (ie. some unused sub-component), then I want to delete it. I could do this with code in FE but it would be way slower than doing it in BE. – emihir0 Jun 28 '16 at 20:43
  • `If there is a Model that is not a parent or child (at some point) of another Model whose type is 5, it is not valid and hence should be deleted.` which reads to me as the `Model` M table row gets deleted (with a cascade to MH) if (M) it not a status of 5 (because type_id is not housed in the MH but rather M) ... because as shown, MH is never refering to itself but rather always back up to M as you stated in comments. So this sounds like a question that has a solution entirely via the Cascade and nothing else: when you want to do it, if type_id is not 5, delete the row in M. – Drew Jun 28 '16 at 20:55
  • @Drew MH refers to M in both parent and child, it doesn't refer to itself. The idea is to make a procedure such that "For each `model` in `Models` if that `model` doesn't contain (OR is not contained by) another `model` whose `type_id` is `5`, delete it" . This "contains/is contained" relationship can be found in the `Model_Hierarchy` table. – emihir0 Jun 28 '16 at 21:00
  • So if you have 1000 models, and 17 have type_id 5, and you were to visualize with those `model_id` values put into a third table `GoodIds` (columns: `id`, `has_been_processed`). Then for each of those id's, you traverse their MH hierarchy adding rows into `GoodIds`(let's call it `insert ignore` ) for each id down the descendant tree of MH. So GoodId's grows from 17 toward 1000 as `has_been_processed` ends with TRUE. When you are done, any row in `Models` not in `GoodIds` gets deleted ? – Drew Jun 28 '16 at 21:07
  • If the above is true, it does not require a recursive solution. – Drew Jun 28 '16 at 21:15
  • @Drew yes, I could see that working. But how do you go about doing that? – emihir0 Jun 28 '16 at 21:25
  • I can do it, if you say that what I said makes sense. I am sure many here could do that. Half the work is fleshing it out like we just did. – Drew Jun 28 '16 at 21:32
  • @Drew indeed, if not more than half. I should mention that I run my RDBMS on Amazon Web Services and so I do not have SUPER priviledges, ie. I cannot create triggers for instance. I'm not familiar with other DBs hosting services (such as Google's) but if they provide it, I would rather use them (currently the project is under development and so it still would be rather easy to swap to another hosting service). I'm saying this because I imagine this would be a good way to activate the procedure to delete all the redundant models (ie. `AFTER INSERT`). – emihir0 Jun 28 '16 at 21:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115895/discussion-between-emihir0-and-drew). – emihir0 Jun 28 '16 at 21:40

1 Answers1

1

Comments are dispersed throughout the code.

Schema:

CREATE TABLE `Models`
(   -- Note that for now the AUTO_INC is ripped out of this for ease of data insertion
    -- otherwise we lose control at this point (this is just a test)
    -- `model_id` MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    `model_id` MEDIUMINT UNSIGNED PRIMARY KEY,
    `type_id` SMALLINT UNSIGNED NOT NULL
)ENGINE = InnoDB;

CREATE TABLE `Model_Hierarchy`
(   -- OP comments state these are more like components
    --
    -- @Drew imagine b being a product and a and c being two different ways to package it. 
    -- Hence b is contained in both a and c respectively and separately (ie. customer can buy 
    -- both a and c), however, any change (outside of the scope of this question) to b is 
    -- reflected to both a and c. `Model_Hierarchy can be altered, yes (the project is 
    -- in an early development). Max tree depth is unknown (this is for manufacturing... 
    -- so a component can consist of a component... that consist of further component etc. 
    -- no real limit). How many rows? Depends, but I don't expect it to exceed 2^32.
    --
    --
    -- Drew's interpretation of the the above: `a` is a parent of `b`, `c` is a parent of `b`
    --
    `parent_id` MEDIUMINT UNSIGNED NOT NULL,
    `child_id` MEDIUMINT UNSIGNED NOT NULL,

    INDEX `fk_Model_Hierarchy_Models1_idx` (`parent_id` ASC),
    INDEX `fk_Model_Hierarchy_Models2_idx` (`child_id` ASC),
    PRIMARY KEY (`parent_id`, `child_id`),
    key(`child_id`,`parent_id`), -- NoteA1 pair flipped the other way (see NoteA2 in stored proc)

    CONSTRAINT `fk_Model_Hierarchy_Models1`
    FOREIGN KEY (`parent_id`)
    REFERENCES `Models` (`model_id`)
    ON DELETE CASCADE
    ON UPDATE NO ACTION,

    CONSTRAINT `fk_Model_Hierarchy_Models2`
    FOREIGN KEY (`child_id`)
    REFERENCES `Models` (`model_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
)ENGINE = InnoDB;

CREATE TABLE `GoodIds`
(   -- a table to determine what not to delete from models
    `id` int auto_increment primary key,
    `model_id` MEDIUMINT UNSIGNED,
    `has_been_processed` int not null,
    dtFinished datetime null,
    -- index section (none shown, developer chooses later, as he knows what is going on)
    unique index(model_id), -- supports the "insert ignore" concept
    -- FK's below:
    foreign key `fk_abc_123` (model_id) references Models(model_id)
)ENGINE = InnoDB;

To drop and start over from the top:

-- ------------------------------------------------------------
-- reverse order is happier
drop table `GoodIds`;
drop table `Model_Hierarchy`;
drop table `Models`;
-- ------------------------------------------------------------

Load Test Data:

insert Models(model_id,type_id) values
(1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(7,1),(8,1),(9,5),(10,1),(11,1),(12,1);
-- delete from Models; -- note, truncate does not work on parents of FK's

insert Model_Hierarchy(parent_id,child_id) values
(1,2),(1,3),(1,4),(1,5),
(2,1),(2,4),(2,7),
(3,2),
(4,8),(4,9),
(5,1),
(6,1),(6,2),
(7,1),(7,10),
(8,1),(8,12),
(9,11),
(10,11),
(11,12);

-- Set 2 to test (after a truncate / copy paste of this below to up above):
(1,2),(1,3),(1,4),(1,5),
(2,1),(2,4),(2,7),
(3,2),
(4,8),(4,9),
(5,1),
(6,1),(6,2),
(7,1),(7,10),
(8,1),(8,12),
(9,1),
(10,11),
(11,12);
-- truncate table Model_Hierarchy;
-- select * from Model_Hierarchy;
-- select * from Models where type_id=5;

Stored Procedure:

DROP PROCEDURE if exists loadUpGoodIds;
DELIMITER $$
CREATE PROCEDURE loadUpGoodIds()
BEGIN
    DECLARE bDone BOOL DEFAULT FALSE;
    DECLARE iSillyCounter int DEFAULT 0;

    TRUNCATE TABLE GoodIds;
    insert GoodIds(model_id,has_been_processed) select model_id,0 from Models where type_id=5;    
    WHILE bDone = FALSE DO
        select min(model_id) into @the_Id_To_Process from GoodIds where has_been_processed=0;
        IF @the_Id_To_Process is null THEN
            SET bDone=TRUE;
        ELSE
            -- First, let's say this is the parent id.
            -- Find the child id's that this is a parent of
            -- and they qualify as A Good Id to save into our Good table 
            insert ignore GoodIds(model_id,has_been_processed,dtFinished) 
            select child_id,0,null
            from Model_Hierarchy
            where parent_id=@the_Id_To_Process;

            -- Next, let's say this is the child id.
            -- Find the parent id's that this is a child of
            -- and they qualify as A Good Id to save into our Good table 
            insert ignore GoodIds(model_id,has_been_processed,dtFinished) 
            select child_id,0,null
            from Model_Hierarchy
            where child_id=@the_Id_To_Process;

            -- NoteA2: see NoteA1 in schema
            -- you can feel the need for the flipped pair composite key in the above

            UPDATE GoodIds set has_been_processed=1,dtFinished=now() where model_id=@the_Id_To_Process;
        END IF;

        -- safety bailout during development:
        SET iSillyCounter = iSillyCounter + 1;
        IF iSillyCounter>10000 THEN
            SET bDone=TRUE;
        END IF;

    END WHILE;
END$$
DELIMITER ;

Test:

call loadUpGoodIds();

-- select count(*) from GoodIds; -- 9 / 11 / 12
select * from GoodIds limit 10;
+----+----------+--------------------+---------------------+
| id | model_id | has_been_processed | dtFinished          |
+----+----------+--------------------+---------------------+
|  1 |        9 |                  1 | 2016-06-28 20:33:16 |
|  2 |       11 |                  1 | 2016-06-28 20:33:16 |
|  4 |       12 |                  1 | 2016-06-28 20:33:16 |
+----+----------+--------------------+---------------------+

Mop up calls, can be folded into stored proc:

-- The below is what to run
-- delete from Models where model_id not in (select null); -- this is a safe call (will never do anything)
-- the above is just a null test

delete from Models where model_id not in (select model_id from GoodIds);
-- Error 1451: Cannot delete or update a parent row: a FK constraint is unhappy
-- hey the cascades did not work, can figure that out later
-- Let go bottom up for now. Meaning, to honor FK constraints, kill bottom up.
delete from Model_Hierarchy where parent_id not in (select model_id from GoodIds);
-- 18 rows deleted
delete from Model_Hierarchy where child_id not in (select model_id from GoodIds);
-- 0 rows deleted
delete from Models where model_id not in (select model_id from GoodIds);
-- 9 rows deleted / 3 remain
select * from Models;
+----------+---------+
| model_id | type_id |
+----------+---------+
|        9 |       5 |
|       11 |       1 |
|       12 |       1 |
+----------+---------+
Drew
  • 24,851
  • 10
  • 43
  • 78
  • For the first set in `Model_Hierarchy`: `9` is the final product, hence cannot be deleted. `11` is used by `9` directly and `12` is used by `11`, hence both `11` and `12` should stay. As for the other direction: `9` is contained (packaged) by `4` and `4` is contained in both `1` and `2`. All other relationships contain `1` and so they should stay. The only one being deleted should be `10`. – emihir0 Jun 29 '16 at 07:43
  • As for the second set: `9` uses `1` and `1` uses `2`, `3`, `4` and `5`. `2` uses `7` and `4` uses `8`. `7` uses `10` and `8` uses `12`. `10` uses `11`. As for what contains `9`: it is contained in `4` and `4` is contained in `1`, hence the only `model_id`s that don't contain `9` or are not contained by `9` in any way is `6`, the other `model_id`s should remain. – emihir0 Jun 29 '16 at 07:47