I am trying to number some records in MySQL (5.5.44-0 on Ubuntu), grouped by another column (you will see what I mean below). I am adapting the solution described at Running Sums for Multiple Categories in MySQL, except I'm just numbering, not summing.
The tables involved are quite huge with almost 100 columns, so let's simplify the demonstration first by creating derived tables with only the important columns involved. Apologies for not sharing a SQL Fiddle, because it doesn't look like it's replicable unless done with big volume of data, which I cannot share:
Creating the tables:
CREATE TABLE `inquiries_test` (
`id` int(11) NOT NULL DEFAULT '0',
`motive` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`inquiry_id`),
KEY `motive` (`motive`)
);
insert into inquires_test select id, motive from inquiries;
CREATE TABLE `leads_test` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`inquiry_id` int(11) DEFAULT NULL,
KEY `id` (`lead_id`)
);
insert into leads_test select lead_id, created_at, inquiry_id;
CREATE TABLE `lead_inserts` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`cnt` int(11) DEFAULT NULL
);
You will notice above that data from inquiries_test and leads_test came from the actual production tables. The importance of that will come to play later. Now populating lead_inserts:
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads_test l join inquiries_test i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (14.30 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 2 |
| 117 | 2015-01-27 22:34:41 | 3 |
| 117 | 2015-03-19 19:33:51 | 4 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 2 |
| 118 | 2015-04-07 21:33:43 | 3 |
| 118 | 2015-04-10 17:00:04 | 4 |
| 118 | 2015-05-12 21:59:49 | 5 |
+---------+---------------------+------+
So far so good - the value of cnt "resets" for every new lead_id. Now given that leads_test and inquiries_tests are basically leads and inquiries with the other columns removed, it makes sense to expect that if I modify the insert statement to use the original tables, the result should be the same, right? But look:
playground>truncate table lead_inserts;
Query OK, 0 rows affected (0.14 sec)
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads l join inquiries i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (17.25 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 1 |
| 117 | 2015-01-27 22:34:41 | 1 |
| 117 | 2015-03-19 19:33:51 | 1 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 1 |
| 118 | 2015-04-07 21:33:43 | 1 |
| 118 | 2015-04-10 17:00:04 | 1 |
| 118 | 2015-05-12 21:59:49 | 1 |
+---------+---------------------+------+
What happened to the numbering? Other observations while using the original tables:
- If I do not process all records and specify only a few lead_id's, the computation comes out right.
- If I remove the INSERT clause and just run it as a select (with a LIMIT clause to just show 50 rows output), the computation comes out right.
So, is this a bug I hit, or am I missing something? In real life I cannot use the process above as a workaround - I really have to use leads and inquiries because there are other columns from those tables that have to be part of lead_inserts.
Thanks!