Is there a shortcut to normalizing a table where the columns=rows?

Question

Suppose you had the mySQL table describing if you can mix two substances

Product   A    B    C
---------------------
A         y    n    y
B         n    y    y
C         y    y    y

The first step would be to transform it like

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

But then you have duplicate information. (eg. If A can mix with B, then B can mix with A), so, you can remove several rows to get

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    B    y
B    C    n
C    C    y

While the last step was pretty easy with a small table, doing it manually would take forever on a larger table. How would one go about automating the removal of rows with duplicate MEANING, but not identical content?

Thanks, I hope my question makes sense as I am still learning databases

`A A y`, `B B y` etc seem pretty obvious. Why don't you generate the table correctly, right from the start? Just take one of the triangles, excluding the diagonal. — mvds, Jul 26 '10 at 18:44

score 4 · Accepted Answer · answered Jul 26 '10 at 18:37

4

If it's safe to assume that you're starting with all relationships doubled up, e.g.

If A B is in the table, then B A is guaranteed to be in the table.

Then all you have to do is remove all rows where P2 < P1;

DELETE FROM `table_name` WHERE `P2` < `P1`;

If this isn't the case, you can make it the case by going through the table and inserting all the duplicate rows if they don't already exist, then running this.

answered Jul 26 '10 at 18:37

Jamie Wong

18,104
8
63
81

I'm glad you found my answer helpful - but I would recommend waiting longer for responses before accepting. It's quite possible that there's a much better solution than this, or there's a critical flaw in mine. – Jamie Wong Jul 26 '10 at 18:48
good answer -- as long as the assumption that all relationships are doubled holds true. If you can't be sure the assumption will hold, though, you should check for duplicates before deleting. – dave Jul 26 '10 at 19:13
All relationships are doubled, I generated the table with a nested for loop. Thanks again – thomas Jul 26 '10 at 19:33

score 2 · Answer 2 · answered Jul 26 '10 at 18:58

2

I don't think it's necessary in your situation, but as an intellectual exercise, you could build on Jamie Wong's solution and prevent non-duplicated columns from being removed with an EXISTS clause. Something like this:

DELETE FROM `table_name` AS t1
  WHERE `P2` < `P1`
    AND EXISTS (SELECT NULL FROM `table_name` AS t2
      WHERE t1.`P1` = t2.`P2` AND t1.`P2` = t2.`P1`);

It pretty much just makes sure that there's a duplicate before deleting anything.

(My MySQL syntax might be a little off; it's been a while.)

answered Jul 26 '10 at 18:58

Isabelle Wedin

1,345
10
26

good point. If you use the delete approach, you should check for duplicates before deleting. You can avoid deleting and the duplicate check altogether, though (see my answer above). – dave Jul 26 '10 at 19:11
@dave Probably shouldn't say "above". Unless you're that certain it will receive more upvotes :P – Jamie Wong Jul 26 '10 at 19:31

dave · Answer 3 · 2010-07-26T19:14:21.717

1

Step 1 (as you've already done): Transform to Table2

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

Step 2: ReOrder Columns, Select Distinct

SELECT DISTINCT
   IF P1<P2 THEN P1 ELSE P2 END as P1, -- this puts the smallest value in P1
   IF P1>P2 THEN P1 ELSE P2 END as P2 -- this puts the largest value in P2
FROM Table2
WHERE NOT P1=P2  --(Assuming records like A, A, y are not interesting)

I'm not a mySQL guy, so you might need to check the if/then syntax, but this seems conceptually ok anyway.

edited Jul 26 '10 at 19:14

answered Jul 26 '10 at 18:57

dave

1,344
10
16

btw -- maybe this is just homework -- but as for a real-world concern -- you might consider mixtures with more than 2 parts. E.g., it might be ok to mix A with B; and ok to mix B with C; but mixing all three A, B & C together might turn out to be something you'd want to avoid. – dave Jul 26 '10 at 19:02

Is there a shortcut to normalizing a table where the columns=rows?

3 Answers3