22

Sorry for the vague title (I just don't know how to describe this conundrum)

Give the following schedule table for a classroom:

Classroom CourseName Lesson StartTime EndTime
1001 Course 1 Lesson 1 0800 0900
1001 Course 1 Lesson 2 0900 1000
1001 Course 1 Lesson 3 1000 1100
1001 Course 2 Lesson 10 1100 1200
1001 Course 2 Lesson 11 1200 1300
1001 Course 1 Lesson 4 1300 1400
1001 Course 1 Lesson 5 1400 1500

I would like to group the table to display this:

Classroom CourseName StartTime EndTime
1001 Course 1 0800 1100
1001 Course 2 1100 1300
1001 Course 1 1300 1500

Basically we are looking at a schedule that show which crouse is using what classroom during a certain timespan...

My initial thought was: Group by Classroom and CourseName and take Max and Min for start\end time but that will not give me the time spans it will show as if Course 1 is using the Classroom from 08:00 - 16:00 with no break in the middle.

Kostas Nitaf
  • 428
  • 1
  • 2
  • 12
Mortalus
  • 10,574
  • 11
  • 67
  • 117
  • I don't know if it can help you, but try this: http://dba.stackexchange.com/questions/31281/count-consecutive-rows-in-mysql Probably it is much more easier if do this within the language you are using the database. (If you are using) – Dávid Szabó Jun 16 '14 at 13:20
  • I haven't actually tried this, but what if you row_number over to get a unique number for each consecutive group (you can order by date). Once you have done this you should be able to do the group by as you suggested above. – Andrew Jun 16 '14 at 13:26

3 Answers3

22

If you're using SQLServer 2012 or better you can use LAG to get the previous value of a column, then SUM() OVER (ORDER BY ...) to create a rolling sum, in this case one that count the change of the CourseName, that can be used as the GROUP BY anchor

With A AS (
  SELECT ClassRoom
       , CourseName
       , StartTime
       , EndTime
       , PrevCourse = LAG(CourseName, 1, CourseName) OVER (ORDER BY StartTime)
  FROM   Table1
), B AS (
  SELECT ClassRoom
       , CourseName
       , StartTime
       , EndTime
       , Ranker = SUM(CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END)
                OVER (ORDER BY StartTime, CourseName)
  FROM   A
)
SELECT ClassRoom
     , CourseName
     , MIN(StartTime) StartTime
     , MAX(EndTime) EndTime
FROM   B
GROUP BY ClassRoom, CourseName, Ranker
ORDER BY StartTime

SQLFiddle demo

Serpiton
  • 3,676
  • 3
  • 24
  • 35
  • Great answer! Clever use of the `SUM()` function. I had to adjust your query to fit my problem by adding `partition by ClassRoom` to the `OVER()` clause in the `LAG()` and `SUM()` functions in order to be able to handle multiple class rooms. Thanks. – Mauricio Morales Oct 30 '17 at 00:38
12

The query determines each rows EndTime by using NOT EXISTS to make sure no other class or course of a different type is scheduled between a course range's StartTime and EndTime and then uses MIN and GROUP BY to find the StartTime.

The NOT EXISTS part ensures that there aren't "breaks" between the StartTime and EndTime ranges by searching for any rows that have an EndTime between StartTime and EndTime but belong to a different CourseName or CourseRoom.

SELECT    
    t0.ClassRoom,
    t0.CourseName,
    MIN(t0.StartTime),
    t0.EndTime
FROM (
    SELECT 
    t1.ClassRoom,
    t1.CourseName,
    t1.StartTime,
    (
        SELECT MAX(t2.EndTime)
        FROM tableA t2
        WHERE t2.CourseName = t1.CourseName
        AND t2.ClassRoom = t1.ClassRoom
        AND NOT EXISTS (SELECT 1 FROM tableA t3
            WHERE t3.EndTime < t2.EndTime 
            AND t3.EndTime > t1.EndTime
            AND (t3.CourseName <> t2.CourseName 
            OR t3.ClassRoom <> t2.ClassRoom)
        )
    ) EndTime
    FROM tableA t1
) t0 GROUP BY t0.ClassRoom, t0.CourseName, t0.EndTime

http://www.sqlfiddle.com/#!6/39d4b/9

FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
  • Okay so that works .. thank you .. but i'm having trouble to understand the last part where it says NOT EXSISTS .... can you elaborate it about the logic ? – Mortalus Jun 16 '14 at 20:01
  • brilliant solution... +1 for that – Gil Peretz Jun 17 '14 at 06:12
  • Could you explain, why when it query 'the first Course 1', when calculating Max(EndTime), it produce (1100) instead of (1500)? – Mark Oct 26 '16 at 03:17
  • 1
    @Mark because the subquery selects the largest end time where a different course that came between the start and end time does not exist. between 0800 and 1500 there was course 2 at 1300, therefore 1500 was not eligible. between 0800 and 1100 there were no other courses. – FuzzyTree Oct 26 '16 at 03:32
1
     CREATE TABLE Classroom(Classroom VARCHAR(100), CourseName  VARCHAR(100),  Lesson    VARCHAR(100), StartTime  VARCHAR(100), EndTime  VARCHAR(100))
 INSERT INTO Classroom
SELECT '1001','Course 1','Lesson 1 ','0800','0900'
UNION SELECT '1001','Course 1','Lesson 2 ','0900','1000'
UNION SELECT '1001','Course 1','Lesson 3 ','1000','1100'
UNION SELECT '1001','Course 2','Lesson 10','1100','1200'
UNION SELECT '1001','Course 2','Lesson 11','1200','1300'
UNION SELECT '1001','Course 1','Lesson 4 ','1300','1400'
UNION SELECT '1001','Course 1','Lesson 5 ','1400','1500'

SELECT * FROM Classroom

;WITH CTE_ClassRooms AS (
SELECT *,ROW_NUMBER() over(partition by classroom,CourseName order by StartTime) AS R FROM Classroom A 
WHERE NOT EXISTS(SELECT 1 FROM Classroom B WHERE B.Classroom = A.Classroom AND B.CourseName = A.CourseName AND B.StartTime = A.EndTime)
UNION ALL
SELECT B.*,R fROM CTE_ClassRooms A JOIN Classroom B ON B.Classroom = A.Classroom AND B.CourseName = A.CourseName AND A.StartTime = B.EndTime
)

--SELECT * FROM CTE_ClassRooms order by Classroom,CourseName,R

SELECT Classroom,CourseName,MIN(StartTime),MAX(EndTime)
FROM CTE_ClassRooms
GROUP BY Classroom,CourseName,R
  • 1
    Welcome to Stack Overflow, please try to provide some additional info instead of just a query. – Marcel Dec 04 '17 at 21:19