4

Let's say we have the table like

id      |   group_id    |   TEXT    |
--------------------------------------
1       |                   |   NBA
--------------------------------------
2       |                   |   NHL
--------------------------------------
3       |                   |   NBA
--------------------------------------
4       |                   |   NHL
--------------------------------------
5       |                   |   NHL
--------------------------------------

Is it possible to create group_id with a MySQL function or query or anything :) using the fact of repeating (duplicate) text in the column TEXT ? to have a table like that

id      |   group_id    |   TEXT    |
--------------------------------------
1       |       10      |   NBA
--------------------------------------
2       |       11      |   NHL
--------------------------------------
3       |       10      |   NBA
--------------------------------------
4       |       11      |   NHL
--------------------------------------
5       |       11      |   NHL
--------------------------------------
Micha
  • 5,117
  • 8
  • 34
  • 47
Serge
  • 679
  • 1
  • 9
  • 23
  • Why are you looking for a numeric value while you already have your key? you're essentially duplicating columns, hence your database isn't normalized anymore. – STT LCU Aug 01 '13 at 07:16
  • It's an intermediary auction for further processing by numeric group_id - a solution below with HEX looks most simple - but it creates alphanumeric (!) group_id like 4E4241 - but I'd like to have ONLY numeric ones like 424241...may be you know how to force HEX create numeric values only ? – Serge Aug 01 '13 at 07:28
  • The question is still standing, I don't get why you NEED a numeric translation of the group, where you already have a group identificator (NBA, NHL and the like are perfectly valid group identifiers) – STT LCU Aug 01 '13 at 07:41
  • Edited answer with hex value to dec – Maxim Zhukov Aug 01 '13 at 08:07
  • If it helped you, please mark the correct answer, m8 – Maxim Zhukov Aug 01 '13 at 09:12

3 Answers3

3

You could try to use HEX function:

SELECT id, HEX(Text) as group_id, Text
FROM Tbl

If you want decimal value, you can conver it from hex:

SELECT id, CONV(HEX(Text), 16, 10) as group_id, Text
FROM Tbl

Result:

ID  GROUP_ID    TEXT
1   5128769 NBA
2   5128769 NBA
3   5130316 NHL
4   5130312 NHH
5   5130316 NHL
8   4342081 BAA
9   4342081 BAA

SQLFiddle

Maxim Zhukov
  • 10,060
  • 5
  • 44
  • 88
1
CREATE TEMPORARY TABLE nums (n int AUTO_INCREMENT PRIMARY KEY, txt varchar(4)) AUTO_INCREMENT=10;
INSERT INTO nums (txt) SELECT DISTINCT text FROM table1;
UPDATE table1 INNER JOIN nums ON txt=text SET group_id=n 

SQLfiddle

Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
  • Very elegant and helpful solution ! – Serge Aug 01 '13 at 06:09
  • when I use this way on big volumes (more than 10 000, for example) - I meet the following ERROR 1104 (42000) at line 1: The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. Could you help how to include SET SQL_BIG_SELECTS=1 to your command ? or I should include SQL_BIG_SELECTS=1 to my.cnf ? Thank you in advance ! – Serge Aug 11 '13 at 07:53
  • The `SELECT` in the `INSERT` statement only looks at a single table. Therefore I am quite surprised that MySQL complains about a MAX_JOIN_SIZE. Can you run the SELECT command on its own to find out, whether it does actually return the expected output? The second possibility could be the `UPDATE` statement: This might be running into difficulties if you forgot to include the JOIN properly ... (just a guess). – Carsten Massmann Aug 11 '13 at 08:35
1

Looking at my old answer I just thought of another way to solve the problem without a tmp table:

CREATE Table tbl (id int,grpid int, text varchar(10));
INSERT INTO tbl (id,text) VALUES (1,'NBA'),(2,'NBA'),
   (3,'NHL'),(4,'NHH'),(5,'NHL'),(8,'BAA'),(9,'BAA');

SET @i:=100; -- set the start sequence number for grpid
UPDATE tbl INNER JOIN ( 
SELECT @i:=@i+1 gid,text FROM (
 SELECT DISTINCT text FROM tbl ORDER BY text ) dt ) gi
ON tbl.text=gi.text 
SET tbl.grpid=gi.gid;

tbl after the UPDATE:

| ID | GRPID | TEXT |
---------------------
|  1 |   102 |  NBA |
|  2 |   102 |  NBA |
|  3 |   104 |  NHL |
|  4 |   103 |  NHH |
|  5 |   104 |  NHL |
|  8 |   101 |  BAA |
|  9 |   101 |  BAA |

sqlfiddle

Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
  • sorry for delay in reply, I have a break with Internet connection. But the sad and strange issue is that your new code AGAIN gives me the same - #1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. In My.cnf I can't add line sql_big_selects = 1 because then mysql failed to start. Max_join_size in My.cnf set to max_join_size = 10000000 Could we use somehow SET SQL_BIG_SELECTS=1 in your code (if MySql so wants to see it there ? – Serge Aug 11 '13 at 12:21
  • So we have the situation when code works perfectly on a table with 2000 records, but problem arises to process another table with 27 000 records...it's just reminding that code works ! but something is associated with quantity of rows to process. – Serge Aug 11 '13 at 15:47
  • @Serge very strange, but I have no further idea why this happens here. 27.000 records are by no means a large quantity for a MySQL database. Maybe there are some DBAs out there who know a bit more about this kind of problem? Please, help us! ;-) – Carsten Massmann Aug 11 '13 at 15:59
  • I agree with you, the situation is very strange. Both your codes works perfectly, and 27 000 rows is not a volume for MySQL. But still the same :( hopefully somebody will add an idea. – Serge Aug 12 '13 at 07:32
  • I just extended my [SQLfiddle](http://www.sqlfiddle.com/#!2/4f6a1/17) to 117656 rows. It runs without problems. If I multiply the rows again by 7 I run into a timeout for the table generation (>60s). But there is no msg like `MAX_JOIN_SIZE` exceeded. – Carsten Massmann Aug 12 '13 at 10:46
  • that's interesting, let's investigate the problem without a hurry. Did you use only 4 TEXT variations mentioned above - I mean NBA, NHL, NHH and BAA only ? May be the problem is not in just a "quantity of rows" to process - but main difference is - we have 117 000 rows where only 4 group_id will be created (based on NBA, NHL, NHH, BAA) - or we have 117 000 rows to create much more group_ids based on NBA, NHL, NHH, BAA, LTR, AAO, LLC etc etc etc ? Let's say not 4 group_ids as above -but 10 000 group_ids ? Could you try such a test on SQLfiddle ? – Serge Aug 13 '13 at 06:37
  • Oops...after 2 days...there are no professionals here to discuss the problem ? – Serge Aug 15 '13 at 14:22
  • I did actually test it again. This time not under SQLfiddle, as that is not the right place when it comes to volume testing, but on one of my remote MySQL hosts. I generated 117656 records again with arbitrary 3-letter codes. Theoretically 26*26*26=17576 different combinations were possible but only 5406 actually occured in the table, since I chose the letter-codes via a `CEIL(RAND()*22)+64` function call. The sorting took its time (actual time > 60s) but I got the desired result. – Carsten Massmann Aug 15 '13 at 16:28
  • thank you very much for the test...unfortunately it means something wrong in my configuration :( and I can't catch what it can be. Could you advise a variation of your code where SET SESSION SQL_BIG_SELECTS = 1 can be inserted ? If my SQL so wants to see it there ? – Serge Aug 16 '13 at 07:41
  • I don't know how to help you with your particular MySQL settings, but maybe we can identify the source of your problem a bit closer? If you apply the solution I put forward in my first post in individual steps (this time make it *not* a temporary table) 1.:`CREATE TABLE nums (n int AUTO_INCREMENT PRIMARY KEY, txt varchar(4)) AUTO_INCREMENT=10;` then 2: `INSERT INTO nums (txt) SELECT DISTINCT text FROM table1;`, when do you get the error message? Maybe you can use a `limit` on the second query and try again? – Carsten Massmann Aug 16 '13 at 13:24
  • I've used your test. First two stages without a problem. I meet my error message on step 3 - when I use : UPDATE table1 INNER JOIN nums ON txt=text SET group_id=n I can't use there LIMIT, so again I'd need to insert SET SQL_BIG_SELECTS=1 somehow inside step 3... – Serge Aug 18 '13 at 12:34
  • Have you checked out this SO-post: http://stackoverflow.com/questions/950465/mysql-sql-big-selects --> maybe setting an index on columns `txt` in `nums` and `text` in `tbl` will make the difference, as suggested in the post? – Carsten Massmann Aug 18 '13 at 18:00
  • thank you again for your support...it seems we can close the issue. That's strange but when instead of increasing of MAX_JOIN_SIZE I've uncommented # max_join_size in MY.CNF - everything works...I have PLESK 11.5 on server - it seems the one also affects to MySQL. So it's not about your code but about Plesk+MySQL something :) Both of of your codes works perfectly ! Have a nice day ! – Serge Aug 19 '13 at 06:02
  • Glad to hear! But is that a reason to "unaccept" my answer to your original question? ;-) – Carsten Massmann Aug 19 '13 at 10:23
  • did I miss something :) ? I'm accepting ALL your answers ! :) that's a pity I can't vote more than once. Best regards ! – Serge Aug 19 '13 at 16:09