2

I have a table in MySQL as below:

ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'2', 'OBJ1', 'OBJ2', '1'
'3', 'OBJ2', 'OBJ1', '3'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'

Relation between col1 and col2 is independent of position, ie OBJ1 in col1 and OBJ2 in col2 is same as OBJ1 in col2 and OBJ2 in col1. This means that OBJ1 and OBJ2 shares a relationship. Now, this means that the object OBJ1 and OBJ2 have a value of 1,5,3... I want to keep only distinct values ie OBJ1, OBJ2 should occur only once in the table, not even OBJ2,OBJ1. Importantly, I want to retain only the row with HIGHEST value. The result I want is thus:

ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'

What is the best and efficient way of doing this? I have over 10 million rows.

I have searched in many forums/Google but cannot find the exact answer I am looking for..

James Z
  • 12,209
  • 10
  • 24
  • 44
Aybid
  • 86
  • 1
  • 8

3 Answers3

0

Try this:

SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM mytable AS t1
JOIN (
   SELECT LEAST(COL1, COL2) AS C1,
          GREATEST(COL1, COL2) AS C2,
          MAX(VALUE) AS max_Value
   FROM mytable
   GROUP BY LEAST(COL1, COL2), 
            GREATEST(COL1, COL2)
) AS t2 ON t1.COL1 = t1.C1 AND t1.COL2 = t2.C2 AND t1.VLAUE = t2.max_Value
Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
  • Thanks a lot Betsos. However the row '4', 'OBJ3', 'OBJ1', '4' is getting missed. Any suggestions why!! Also theres a slight typo in the query: ON t1.COL1 = t1.C1 should be ON t1.COL1 = t2.C1 if I am not wrong. – Aybid Nov 25 '16 at 04:38
0

You could use an in clause and subselect grouped by

for solve also the problem related to the distinct pair combination You should organize the data in a proper way

  select 
    id 
  , case when  col1 <= col2 then col1 else col2  end COL1
  , case when  col1 > col2 then col1 else col2 end COL2
  , value 
  from start_table 

then the query became

  SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
  FROM (
    select 
      id 
    , case when  col1 <= col2 then col1 else col2  end COL1
    , case when  col1 > col2 then col1 else col2 end COL2
    , value 
    from start_table 
  )  t1
  where value in ( 
      select max(value) 
      FROM (
        select 
          id 
        , case when  col1 <= col2 then col1 else col2  end COL1
        , case when  col1 > col2 then col1 else col2 end COL2
        , value 
        from start_table 
      ) mytable 
      group by col1, col2
  )

or using an inner join

SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM (
    select 
      id 
    , case when  col1 <= col2 then col1 else col2  end COL1
    , case when  col1 > col2 then col1 else col2 end COL2
    , value 
    from start_table 
  )  t1 
inner join 
  (
    select max(value)  as value
    FROM (
    select 
      id 
    , case when  col1 <= col2 then col1 else col2  end COL1
    , case when  col1 > col2 then col1 else col2 end COL2
    , value 
    from start_table 
  )  mytable 
    group by col1, col2
  ) T2 on t1.value = t2.value
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
  • Thanks Ankit, but I still have the problem of col1-col2 mapping. `code` '1', 'OBJ1', 'OBJ2', '5' '3', 'OBJ2', 'OBJ1', '3' `code` whereas I want only single occurence of the combination of OBJ1 and OBJ2 – Aybid Nov 25 '16 at 04:29
0

Rebuild the table so that no dups are allowed; in the process, get rid of the dups. (And get rid of the apparently useless id.)

CREATE TABLE new (
    col1 ...,
    col2 ...,
    `value` ...,
    PRIMARY KEY(col1, col2),
    INDEX(col2, col2, `value`)
) ENGINE=InnoDB;

INSERT INTO new (col1, col2, `value`)
    SELECT LEAST(col1, col2),
           GREATEST(col1, col2),
           `value`
    ON DUPLICATE KEY UPDATE
           `value` := GREATEST(`value`, VALUES(`value`));

RENAME TABLE real TO old,
             new TO real;

DROP TABLE old;

In the future, you will need this for INSERTing/UPDATEing new rows:

INSERT INTO new (col1, col2, `value`)
    VALUES (?, ?, ?)
    ON DUPLICATE KEY UPDATE
           `value` := GREATEST(`value`, VALUES(`value`));

(This assumes you want to increase value whenever it is already in the table.)

These save space and speed (important for 10M rows): Getting rid of id; having optimal indexes; using InnoDB; etc.

Rick James
  • 135,179
  • 13
  • 127
  • 222