17

I have data in following format.

match_id   team_id   won_ind
----------------------------
37          Team1    N
67          Team1    Y
98          Team1    N
109         Team1    N
158         Team1    Y
162         Team1    Y
177         Team1    Y
188         Team1    Y
198         Team1    N
207         Team1    Y
217         Team1    Y
10          Team2    N
13          Team2    N
24          Team2    N
39          Team2    Y
40          Team2    Y
51          Team2    Y
64          Team2    N
79          Team2    N
86          Team2    N
91          Team2    Y
101         Team2    N

Here match_ids are in chronological order, 37 is the first and 217 is the last match played by team1. won_ind indicated whether the team won the match or not.

So, from the above data, team1 has lost its first match, then won a match, then lost 2 matches, then won 4 consecutive matches and so on. Now I'm interested in finding the longest winning streak for each team.

Team_id   longest_streak
------------------------
Team1     4
Team2     3

I know how to find this in plsql, but i was wondering if this can be calculated in pure SQL. I tried using LEAD, LAG and several other functions, but not getting anywhere.

I have created sample fiddle here.

Noel
  • 10,152
  • 30
  • 45
  • 67
  • 1
    I don't have time to replicate the writeup, but [this excellent article](http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data) discusses how to accomplish this using self-joins and sums. – eykanal Jul 24 '13 at 16:20

4 Answers4

14
with original_data as (
  select 37  match_id, 'Team1' team_id, 'N' won_id from dual union all
  select 67  match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 98  match_id, 'Team1' team_id, 'N' won_id from dual union all
  select 109 match_id, 'Team1' team_id, 'N' won_id from dual union all
  select 158 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 162 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 177 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 188 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 198 match_id, 'Team1' team_id, 'N' won_id from dual union all
  select 207 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 217 match_id, 'Team1' team_id, 'Y' won_id from dual union all
  select 10  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 13  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 24  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 39  match_id, 'Team2' team_id, 'Y' won_id from dual union all
  select 40  match_id, 'Team2' team_id, 'Y' won_id from dual union all
  select 51  match_id, 'Team2' team_id, 'Y' won_id from dual union all
  select 64  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 79  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 86  match_id, 'Team2' team_id, 'N' won_id from dual union all
  select 91  match_id, 'Team2' team_id, 'Y' won_id from dual union all
  select 101 match_id, 'Team2' team_id, 'N' won_id from dual 
),
----------------------------------------------------------------------
new_streaks as (
--
--  Identifying new streaks.
--  ------------------------
--
    select
      match_id,
      team_id,
      won_id,
--
--  A new streak is identfied if 
--
    case when
--
--    a) won_id = 'Y' and
--
      won_id = 'Y' and
--
--    b) the previous won_id = 'N':
--    
      lag(won_id) over (partition by team_id order by match_id) = 'N' 
--
--    
      then 1 
--
--    All other cases: no new streak:
      else 0 
-- 
     end new_streak
    from
      original_data
),
-------------------------------
streak_no as (
--
--  Assigning a unique number to each streak.
--  -----------------------------------------
--
select
--
    match_id,
    team_id,
--
--  In order to be able to count the number of records
--  of a streak, we first need to assign a unique number
--  to each streak:
--
    sum(new_streak) over (partition by team_id order by match_id) streak_no
--
from
    new_streaks 
where
--  We're only interested in «winning streaks»:
    won_id = 'Y'
),
-----------------------------------------------
--
--  Counting the elements per streak
--  --------------------------------
--
records_per_streak as (
select 
  count(*) counter,
  team_id,
  streak_no
from
  streak_no
group by
  team_id,
  streak_no
)
------------------------------------------------
--
--   Finally: we can find the «longest streak»
--   per team:
--
select
  max(counter) longest_streak,
  team_id
from
  records_per_streak 
group by team_id
;
René Nyffenegger
  • 39,402
  • 33
  • 158
  • 293
  • Excellent.. though it is similar to Slartibartfast's answer, this is very easy to understand. – Noel Jul 26 '13 at 02:02
  • For compatibility with PostgreSQL & SQLite, replace the *original_data* WITH query: `WITH original_data (match_id, team_id, won_id) AS (VALUES (37, 'Team1', 'N'), (67, 'Team1', 'Y'), ... )` – Clint Pachl Sep 03 '20 at 22:05
8

This should work, Fiddle here: http://sqlfiddle.com/#!4/31f95/27

SELECT   team_id, MAX(seq_length) AS longest_sequence
      FROM (SELECT   team_id, COUNT(*) AS seq_length
                 FROM (SELECT team_id, won_ind,match_id, SUM(new_group) OVER(ORDER BY match_id) AS group_no
                         FROM (SELECT   team_id, won_ind, match_id,
                                        DECODE(LAG(won_ind) OVER(ORDER BY match_id), won_ind, 0, 1) AS new_group
                                   FROM matches
                               ORDER BY team_id))
                WHERE won_ind = 'Y'
             GROUP BY team_id, group_no)
   GROUP BY team_id
   ORDER BY 2 DESC, 1;
Vrashabh Irde
  • 14,129
  • 6
  • 51
  • 103
  • just a question, what do the numbers mean in your order by clause? – Jafar Kofahi Jul 24 '13 at 16:34
  • 1=team_id,2=longest_sequence, the columns for select – Vrashabh Irde Jul 24 '13 at 16:37
  • @Slartibartfast i think `partition by team_id` is needed both in `DECODE` and `SUM` functions. When i ran the inner select statements separately, there was some discrepancy in the calculation of new_group and group_no – Noel Jul 26 '13 at 03:34
6

I had a similar task on Teradata, modified it to run on Oracle:

SELECT
   team_id,
   MAX(cnt)
FROM
 (
   SELECT
      team_id,
      COUNT(*) AS cnt
   FROM 
    (
      SELECT
        team_id, 
        match_id,
        won_ind,
        SUM(CASE WHEN won_ind <> 'Y' THEN 1 END) 
        OVER (PARTITION BY team_id 
              ORDER BY match_id 
              ROWS UNBOUNDED PRECEDING) AS dummy
      FROM matches
    ) dt
   WHERE won_ind = 'Y'
   GROUP BY team_id, dummy
 ) dt
GROUP BY team_id;
dnoeth
  • 59,503
  • 4
  • 39
  • 56
3

Using a variant of an answer I posted here

select
    team_id,
    max(wins)
  from
    (
     select
            a.team_id,
            a.match_id amatch,
            b.match_id bmatch,
    (select count(distinct match_id) 
       from matches matches_inner
      where a.team_id = matches_inner.team_id
        and matches_inner.match_id between a.match_id and b.match_id) wins
      from
            matches a
            join matches b on a.team_id = b.team_id 
                      and b.match_id > a.match_id
     where
    not exists 
    (select 'x'
       from matches matches_inner
      where a.team_id = matches_inner.team_id
        and matches_inner.match_id between a.match_id and b.match_id
        and matches_inner.won_ind = 'N')

group by team_id
Community
  • 1
  • 1
Joe
  • 6,767
  • 1
  • 16
  • 29
  • 1
    Nice. But, if the longest streak is 1, this doesn't return the value. Changing `b.match_id > a.match_id` to `b.match_id >= a.match_id` should solve that. – Noel Jul 27 '13 at 05:53
  • There is also missing closing parenthesis at the end. – jakejgordon Oct 19 '15 at 02:37