1

I'm studying to interview for a job involving a lot of SQL. I've noticed a few exercises that revolve around returning values based on a sequence across rows, and I would love to know if there's a standard way of doing this. Something akin to the subquery below I've found useful for selecting a max/min value:

( SELECT column FROM table ... ORDER BY column [DESC] LIMIT 1 )

Here's a relevant example from LeetCode:

Table: Stadium

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+
visit_date is the primary key for this table.

Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit. No two rows will have the same visit_date, and as the id increases, the dates increase as well.

Write an SQL query to display the records with three or more rows with consecutive id's, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order.

The query result format is in the following example.

Stadium table:
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 1    | 2017-01-01 | 10        |
| 2    | 2017-01-02 | 109       |
| 3    | 2017-01-03 | 150       |
| 4    | 2017-01-04 | 99        |
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

Result table:
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

My attempt to solve the problem used a user variable. The code below was my best shot at a solution:

SET @rowIndex = 0;

SELECT s1.id, s1.visit_date, s1.people
FROM ( SELECT @rowIndex:=@rowIndex+1 as rowIndex, s.id, s.visit_date, s.people
    FROM Stadium as s
    WHERE s.people >=100 ) as s1
GROUP BY rowIndex - s1.id, s1.id, s1.visit_date, s1.people
HAVING COUNT(s.id) >= 3
ORDER BY s1.visit_date

The query above has a syntax error somewhere. Message:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT s1.id, s1.visit_date, s1.people FROM ( SELECT @rowIndex := @rowIndex+1 as' at line 4

Does anyone have a favorite way of selecting rows based on a sequence? Perhaps less importantly, can anyone spot the error in my query above?

spheroidic
  • 199
  • 7
  • 1
    Missing comma between `s1.id`and `s1.visit_date` in the GROUP BY. – jarlh Oct 28 '20 at 18:58
  • @jarlh thanks, that's one. Edited to reflect. I ran it again and it looks like there's still an error in the query. Any ideas are appreciated! – spheroidic Oct 28 '20 at 19:29

1 Answers1

0

I would address this as a gaps-and-island problem. Here is an approach using window functions:

select id, visit_date, people
from (
    select s.*, count(*) over(partition by id - rn) cnt
    from (
        select s.*, row_number() over(order by id) rn
        from stadium s
        where people > 100
    ) s
) s
where cnt >= 3

The idea is to filter out days that have less than 100 visits, and then use the difference between id and a monotonically increasing rank to identify the islands (consecutive days with more than 100 visits). We can then retain groups that have more than 3 rows only.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Awesome looks like this does the trick. I have a question about `COUNT(*) OVER(...)` - I think I tried to achieve the same function by using COUNT() in my HAVING statement. My reasoning was that as a window function, COUNT() would return the count for the relevant window, causing the HAVING statement to weed out windows with counts less than 3. Do you think this is correct? – spheroidic Oct 28 '20 at 19:37