2

I have a table:

 UNIT_ID | YEAR | MONTH |  VAR
---------+------+-------+------
    1    | 2015 |   1   |   0
    1    | 2015 |   2   |   0
    1    | 2015 |   3   |   0
    2    | 2015 |   1   |  10
    2    | 2015 |   2   |  10
    2    | 2015 |   3   |  10
    1    | 2015 |   4   |   5
    1    | 2015 |   5   |   5
    1    | 2015 |   6   |   5
    2    | 2015 |   4   |  10
    2    | 2015 |   5   |   3
    2    | 2015 |   6   |   3
    3    | 2016 |   1   |   3
    3    | 2016 |   2   |   3
    3    | 2016 |   3   |   3
    3    | 2016 |   4   |   3
    2    | 2016 |   6   |   0
    2    | 2016 |   7   |   0
    2    | 2016 |   8   |   0

I want to know which units have a sequence bigger than 3 zeros or bigger than 4 values repeated. Grouped by year. So, my result table would be like this:

1 | 2015 | true
2 | 2015 | true
2 | 2016 | true 

I have found this solution but unfortunately I could not adapt to my case. I need also that the query is in MySQL.

Community
  • 1
  • 1
Felipe Costa
  • 128
  • 9

1 Answers1

1

You could just join them 4 times. Last join is a left join to allow the case for 3 0's.

select a.unit_id, a.year, 'true'
from tbl a
join tbl b on a.unit_id = b.unit_id and a.year = b.year and a.month+1 = b.month and a.var = b.var
join tbl c on b.unit_id = c.unit_id and b.year = c.year and b.month+1 = c.month and b.var = c.var
left join tbl d on c.unit_id = d.unit_id and c.year = d.year and c.month+1 = d.month and c.var = d.var
where a.var = 0 or d.var is not null;

Faster and more generic solution. It scans the table only once, and uses user defined variables (@pu for previous unit_id, @py for previous year, etc) to remember the previous row:

select distinct unit_id, year
from (
  select unit_id, `year`, `month`, `var`, 
    if(unit_id=@pu and `year`=@py and `month`=@pm+1 and `var`=@pv, @i:=@i+1, @i:=1)*
    if(@pu:=unit_id,1,1)*if(@py:=`year`,1,1)*if(@pm:=`month`,1,1)*if(@pv:=`var`,1,1) as c
  from table1 a
  join (select @pu:=null, @py:=null, @pm:=null, @pv:=null, @i:=1) b
  order by unit_id, `year`, `month`, `var`) a
group by unit_id, `year`, `var`
having (`var` = 0 and max(c) >= 3) or (`var` != 0 and max(c) >= 4);

fiddle

Fabricator
  • 12,722
  • 2
  • 27
  • 40
  • Hello, Thanks for the answer. I am trying to apply on my data but since there are millions of rows this query is taking a little bit to perform (more than 3 hours now). Probably I can fine tune my indexes but I would like to know if there is another way with less computational cost and yet more generic, for instance, if I want to change the criteria from 4 consecutives to 6... – Felipe Costa Feb 04 '16 at 09:54
  • yes. I found a faster and more generic solution! see updated answer – Fabricator Feb 04 '16 at 17:26
  • Hey, thank you very much, works like a charm. I have to start using variables. – Felipe Costa Feb 09 '16 at 16:48