Longest Consecutive Days Count for BigQuery

Question

Right now I just have an aggregate of how many days a user has worked. I'm trying to change this query to most continuous days worked.

Where u12345 would be 4 and u1 would be 2.

Is this possible to do with a BigQuery statement?

EDIT I am Kind of close with the following query but my u1 is getting 3 instead of 2.

SELECT MIN(e.timestamp) as date_created, e.uid, COUNT(e.uid) + 1 AS streak
FROM OnSite e
LEFT JOIN OnSite ee 
  ON e.uid = ee.uid 
AND DATE(e.timestamp) = DATE(DATE_ADD(ee.timestamp, INTERVAL -1 DAY))
WHERE ee.uid IS NOT NULL
GROUP BY e.uid;

Schema (MySQL v5.7)

CREATE TABLE OnSite
    (`uid` varchar(55), `worksite_id`  varchar(55), `timestamp` datetime)
;

INSERT INTO OnSite
    (`uid`, `worksite_id`, `timestamp`)
VALUES
  ("u12345", "worksite_1", '2019-01-01'),
  ("u12345", "worksite_1", '2019-01-02'),
  ("u12345", "worksite_1", '2019-01-03'),
  ("u12345", "worksite_1", '2019-01-04'),
  ("u12345", "worksite_1", '2019-01-06'),
  ("u1", "worksite_1", '2019-01-01'),
  ("u1", "worksite_1", '2019-01-02'),
  ("u1", "worksite_1", '2019-01-05'),
  ("u1", "worksite_1", '2019-01-06')

;

Query #1

SELECT    uid, COUNT(DISTINCT timestamp) Total
FROM      OnSite
GROUP BY  uid;

| uid    | Total |
| ------ | ----- |
| u1     | 4     |
| u12345 | 5     |

View on DB Fiddle

Maybe you need to process data after the query. I don't know query to do that. — Maxi Schvindt, Jan 16 '19 at 21:53
@Cochu I added an edit that I'm getting close but I'm one number off — bryan, Jan 16 '19 at 21:55
u12345 would be 3, not 4 because "2019-01-04" doesn't have next day — Maxi Schvindt, Jan 16 '19 at 22:03
The thought is that 2019-01-01 to 2019-01-04 is 4 consecutive days. So it would be 4 @cochu — bryan, Jan 16 '19 at 22:14
I think you will either need a stored procedure or some application code as to do this requires looping through all the data. The reason `COUNT` doesn't work is because your `u1` has more than one streak, and days from both streaks are being counted in your result. — Nick, Jan 16 '19 at 22:52
@Nick hmm, well my current result is just counting total days all together. But I see what you mean. Would it be possible to just list both streaks? It would suck to get 5 years worth of data for every single user and code a consecutive day streak — bryan, Jan 16 '19 at 22:55
please check my answer. It gives you expected result with no use of stored procedures — Derviş Kayımbaşıoğlu, Jan 16 '19 at 23:00
It does seem @Simonare has proved me wrong as long as you are using MySQL 8 — Nick, Jan 16 '19 at 23:01
It appears so, unfortunately this was for BigQuery and it doesn't like it. But I never mentioned it @Nick — bryan, Jan 16 '19 at 23:04
@bryan If the answer doesn't actually solve your problem you should leave the question open so you can possibly get a solution which does work for you (you can just upvote the answer instead). I've added the `bigquery` tag to your question so that answers should be limited to code that works there. — Nick, Jan 16 '19 at 23:07

score 9 · Accepted Answer · answered Jan 17 '19 at 05:11

Below is for BigQuery Standard SQL

In case if you are interested in max consecutive days of the users on the same worksite:

#standardSQL
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid, worksite_id ORDER BY ts) grp
    FROM (
      SELECT uid, worksite_id, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid, worksite_id ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

In case if worksite does not matter and you are looking just for max consecutive days:

#standardSQL
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid ORDER BY ts) grp
    FROM (
      SELECT uid, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

You can test, play any of above with he sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'u12345' uid, 'worksite_1' worksite_id, DATE '2019-01-01' ts UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-02' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-03' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-04' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-06' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-01' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-02' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-05' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-06' 
)
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid ORDER BY ts) grp
    FROM (
      SELECT uid, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

with result:

Row uid     max_consecuitive_days    
1   u12345  4    
2   u1      2

Thank you Mikhail! This works exactly as expected. I did have a question about what you meant by for the `users on the same worksite` bit. I'm having trouble understanding what is going on here. It works a bit differently if I were to add `FROM project.dataset.table WHERE worksite_id = 'x'` (which might be the totally wrong way of focusing on one worksite) — bryan, Jan 17 '19 at 13:57
what I meant is - you can look for `Longest Consecutive Days Count` for each user no matter which worksite that user was working on. But another option would be to look for `Longest Consecutive Days Count` for each user with extra condition of being on the same worksite during those days. I do not know nature of your data and meaning of worksite - so provided you with two versions of query for respective options. hope this answers your follow up question :o) — Mikhail Berlyant, Jan 17 '19 at 19:32

Derviş Kayımbaşıoğlu · Answer 2 · 2019-01-16T22:56:16.760

4

does this fit for you?

set @gr=1;
select uid, max(cnt) max_cnt from (
  select uid, grp, count(*) cnt from (
    select uid, 
      case when ifnull(DATE_ADD(oldDate, INTERVAL 1 DAY), timestamp)= timestamp then 
        @gr 
      else  
        @gr := @gr +1 
      end grp
    from
    (
        SELECT    
          uid, 
          timestamp, 
          lag(timestamp) over (partition by uid order by timestamp asc) as oldDate
        FROM      OnSite
    ) t
  )t2
  group by uid, grp
  )t3
group by uid

Result

| uid    | max_cnt |
| ------ | ------- |
| u1     | 2       |
| u12345 | 4       |

DB Fiddle

edited Jan 16 '19 at 22:56

answered Jan 16 '19 at 22:04

Derviş Kayımbaşıoğlu

28,492
4
50
72

I think there is some errors in the syntax, it's not running for me in the DB Fiddle – bryan Jan 16 '19 at 22:05
I added my Db Fiddle – Derviş Kayımbaşıoğlu Jan 16 '19 at 22:10
Can you check the link? I'm just seeing my first query there – bryan Jan 16 '19 at 22:11
Yeah, your DB Fiddle only make a group by – Maxi Schvindt Jan 16 '19 at 22:29
I am sorry, I forgot to fork :) I updated the link now – Derviş Kayımbaşıoğlu Jan 16 '19 at 22:32
I dont think this is giving me the right data back. u12345 would be 4 and u1 would be 2. – bryan Jan 16 '19 at 22:40
Please check my answer. It now gives you expected result – Derviş Kayımbaşıoğlu Jan 16 '19 at 22:57
2

Awesome work Simonare, it works as expected! Thanks for looking into this. Unfortunately doesn't work for my BigQuery purposes but I never said anything about that to you so I've marked it as correct – bryan Jan 16 '19 at 23:05
Still is a good answer even if it turns out not to be of use to OP. – Nick Jan 16 '19 at 23:22

Longest Consecutive Days Count for BigQuery

2 Answers2