I have data on group session attendance by topic, in which one row is one user attending a session on a given date/time. A group offering can have a set number of sessions, e.g. 10, and they are usually offered at the same time (StartTime) and day of week (DayOfWeek).
I would like to identify start and end dates of the group offering, that is the first session and the last session for a given cohort. A cohort would be determined from the list of the same users who have attended the group sessions.
Table: ---GroupSessions---
GroupSessionDate UserID Topic StartTime DayOfWeek ArrivalStatus
Jan-08-2015 1 A 11:30:00 AM Thursday Arrived
Jan-08-2015 2 A 11:30:00 AM Thursday Arrived
Jan-08-2015 3 A 11:30:00 AM Thursday Arrived
Jan-08-2015 4 A 11:30:00 AM Thursday Arrived
Jan-15-2015 1 A 11:30:00 AM Thursday Arrived
Jan-15-2015 2 A 11:30:00 AM Thursday Arrived
Jan-15-2015 3 A 11:30:00 AM Thursday Arrived
Jan-15-2015 4 A 11:30:00 AM Thursday Arrived
Jan-22-2015 1 A 11:30:00 AM Thursday Arrived
Jan-22-2015 2 A 11:30:00 AM Thursday Arrived
Jan-22-2015 3 A 11:30:00 AM Thursday Arrived
Jan-22-2015 4 A 11:30:00 AM Thursday Missed
May-15-2015 5 A 09:00:00 AM Friday Arrived
May-15-2015 2 A 09:00:00 AM Friday Arrived
May-15-2015 6 A 09:00:00 AM Friday Arrived
May-22-2015 5 A 09:00:00 AM Friday Arrived
May-22-2015 6 A 09:00:00 AM Friday Arrived
May-22-2015 2 A 09:00:00 AM Friday Missed
May-29-2015 5 A 09:00:00 AM Friday Arrived
May-29-2015 6 A 09:00:00 AM Friday Arrived
May-29-2015 2 A 09:00:00 AM Friday Missed
In the example above, there are 2 cohorts. Cohort 1 would be comprised of users 1,2,3,and 4 and nearly all of whom have attended the group offering (topic A) between Jan-08-2015 and Jan-22-2015. The same users attended Jan-15-2015 session as the Jan-08-2015 session and nearly all also attended Jan-22-2015 session.
The Cohort 2 (also for topic A) is comprised of users 2,5,6, with offering dates May 15 to 29th, 2015.
The Number of sessions is not set per offering, as it can change depending on demand, so I cannot factor in the number of sessions from the offering date.
I have looked at the Oracle/SQL: Split two inter-related lists into independent cohorts but the question remains unanswered.
Normally I would do the inspection visually and assign users to cohorts, but I have tens of thousands of rows, and was hoping there was a more efficient way to do this with SQL. I am running MSSQL 2014.
I have tried using OUTER APPLY with the table on itself but I am not really getting the result I am looking for.
Could you point me in the right direction please?
SQL:
SELECT src.UserID
,src.GroupSessionDate
,src.StartTime
,src.DayofWeek
,src.Topic
,prevsessdata.GroupSessionDate
FROM GroupSessions src OUTER APPLY
(SELECT TOP 1 * GroupSessions prevsd WHERE src.Topic=prevsd.Topic
AND src.UserID=prevsd.UserID AND src.DayOfWeek=prevsd.DayOfWeek
AND src.StartTime=prevsd.StartTime
AND prevsd.GroupSessionDate<src.GroupSessionDate) prevsessdata
EDIT: Assumptions that can be made:
- sessions generally happen 1 week apart (exception: holidays)
- all sessions in the same cohort would have the same topic, be on same day of the week, and same start time
- users can stop attending but they would be listed in the sessions they missed - (sorry I missed this crucial detail in the original question!)
- same users will generally be listed for all the sessions of group offering, although users can attend groups on multiple topics, so there may be instances where a user is listed also as being part of another cohort (here user 2 attended offering in Jan and in May). But this would be limited to 1 or 2 users of 10-12 in a cohort.
Desired output: For each Topic, StartTime, and DoW, list Cohort Start/End Dates.
Thanks!