0

I am looking for an update statement that will group terms by language in the following table

CREATE TABLE _tempTerms(
      ID int(8) unsigned NOT NULL AUTO_INCREMENT,
      TTC_ART_ID mediumint(8) unsigned,
      TTC_TYP_ID mediumint(8) unsigned,
      Name varchar(200),
      Value varchar(200),
      ID_Lang tinyint(3) unsigned,
      Sequence smallint unsigned,
      Group_ID int(8) unsigned DEFAULT 0,
      PRIMARY KEY(TTC_ART_ID, TTC_TYP_ID, Name, Value),
      UNIQUE KEY(ID)
      );

All data except Group_ID is inserted into the table. I need to update the table so that I auto-generate new Group_IDs and the Group_ID for all records with same combination of TTC_ART_ID, TTC_TYP_ID and Sequence will get the same Group_ID. I guess I need a variable to store the current value for Group_ID and so far I experimented with

SET @group_id:=1;
UPDATE _tempTerms 
SET Group_ID = (@group_id := @group_id + 1);

which just gives a new group_id to every new record. I believe I need a SELECT Statement somewhere to check if there is a group_id already given, but I am confused on how I go about it.

Thank you

Drew
  • 24,851
  • 10
  • 43
  • 78
  • btw those are called [User Variables](http://dev.mysql.com/doc/refman/5.7/en/user-variables.html). Not [Local variables](http://dev.mysql.com/doc/refman/5.7/en/declare-local-variable.html). – Drew Jun 19 '16 at 16:14
  • so it will be an update with a join pattern. The join can be on a derived table that has a `group by` or `distinct` and an incremented rank or rownum via a variable. That sort of thing. – Drew Jun 19 '16 at 16:20
  • @Drew Thank you for your correction and suggestion. –  Jun 19 '16 at 16:25
  • Someone will answer this shortly. If they don't just ping me. – Drew Jun 19 '16 at 16:25

1 Answers1

1

Schema:

create database xGrpId; -- create a test db
use xGrpId; -- use it

CREATE TABLE _tempTerms(
      ID int(8) unsigned NOT NULL AUTO_INCREMENT,
      TTC_ART_ID mediumint(8) unsigned,
      TTC_TYP_ID mediumint(8) unsigned,
      Name varchar(200),
      Value varchar(200),
      ID_Lang tinyint(3) unsigned,
      Sequence smallint unsigned,
      Group_ID int(8) unsigned DEFAULT 0,
      PRIMARY KEY(TTC_ART_ID, TTC_TYP_ID, Name, Value),
      UNIQUE KEY(ID)
      );

-- truncate table _tempTerms;
insert _tempTerms(TTC_ART_ID,TTC_TYP_ID,Name,Value,ID_Lang,Sequence) values
(1,2,'n','v1',66,4),
(1,1,'n','v2',66,4),
(1,1,'n','v3',66,3),
(1,1,'n','v4',66,4),
(1,1,'n','v5',66,4),
(1,1,'n','v6',66,3),
(2,1,'n','v7',66,4),
(1,2,'n','v8',66,4);

View them:

select * from _tempTerms order by id;
select distinct TTC_ART_ID,TTC_TYP_ID,Sequence from _tempTerms; 
-- 4 rows

-- update _tempTerms set Group_ID=0; -- clear before testing

The query:

update _tempTerms t
join
(   select TTC_ART_ID,TTC_TYP_ID,Sequence,@rn:=@rn+1 as rownum
    from
    (   select distinct TTC_ART_ID,TTC_TYP_ID,Sequence
        from _tempTerms
        -- put your `order by` here if needed
        order by TTC_ART_ID,TTC_TYP_ID,Sequence
    ) d1
    cross join (select @rn:=0) as xParams
) d2
on d2.TTC_ART_ID=t.TTC_ART_ID and d2.TTC_TYP_ID=t.TTC_TYP_ID and d2.Sequence=t.Sequence
set t.Group_ID=d2.rownum;

Results:

select * from _tempTerms order by TTC_ART_ID,TTC_TYP_ID,Sequence;

+----+------------+------------+------+-------+---------+----------+----------+
| ID | TTC_ART_ID | TTC_TYP_ID | Name | Value | ID_Lang | Sequence | Group_ID |
+----+------------+------------+------+-------+---------+----------+----------+
|  3 |          1 |          1 | n    | v3    |      66 |        3 |        1 |
|  6 |          1 |          1 | n    | v6    |      66 |        3 |        1 |

|  2 |          1 |          1 | n    | v2    |      66 |        4 |        2 |
|  4 |          1 |          1 | n    | v4    |      66 |        4 |        2 |
|  5 |          1 |          1 | n    | v5    |      66 |        4 |        2 |

|  1 |          1 |          2 | n    | v1    |      66 |        4 |        3 |
|  8 |          1 |          2 | n    | v8    |      66 |        4 |        3 |

|  7 |          2 |          1 | n    | v7    |      66 |        4 |        4 |
+----+------------+------------+------+-------+---------+----------+----------+

Cleanup:

drop database xGrpId;

d1, d2, and xParams are derived tables. Every derived table needs a name. The purpose of xParams and the cross join is merely to bring in a variable to initialize the row number. This is because mysql lacks CTE functionality found in other RDBMS's. So, don't overthink the cross join. It is like saying LET i=0.

Drew
  • 24,851
  • 10
  • 43
  • 78
  • Thank you for this excellent answer! Just one more thing: My table will be about 100 million rows when I execute this. Maybe you have any tips on indexes to improve performance? With 150 thousand rows it took about 13 secs. –  Jun 19 '16 at 18:09
  • Try it with like 3 to 5 million. Perhaps we can change the distinct to a `group by` after changing the distinct line to an aggregate. But if it is a throw-away routine and it performs ok, why over-engineer it. Let me know. Note, some people use a group by in queries that don't even have aggregates as a trick. Aggregate functions like sum, avg, count ... – Drew Jun 19 '16 at 18:13