0

I have data on group session attendance by topic, in which one row is one user attending a session on a given date/time. A group offering can have a set number of sessions, e.g. 10, and they are usually offered at the same time (StartTime) and day of week (DayOfWeek).

I would like to identify start and end dates of the group offering, that is the first session and the last session for a given cohort. A cohort would be determined from the list of the same users who have attended the group sessions.

Table: ---GroupSessions--- GroupSessionDate UserID Topic StartTime DayOfWeek ArrivalStatus Jan-08-2015 1 A 11:30:00 AM Thursday Arrived Jan-08-2015 2 A 11:30:00 AM Thursday Arrived Jan-08-2015 3 A 11:30:00 AM Thursday Arrived Jan-08-2015 4 A 11:30:00 AM Thursday Arrived Jan-15-2015 1 A 11:30:00 AM Thursday Arrived Jan-15-2015 2 A 11:30:00 AM Thursday Arrived Jan-15-2015 3 A 11:30:00 AM Thursday Arrived Jan-15-2015 4 A 11:30:00 AM Thursday Arrived Jan-22-2015 1 A 11:30:00 AM Thursday Arrived Jan-22-2015 2 A 11:30:00 AM Thursday Arrived Jan-22-2015 3 A 11:30:00 AM Thursday Arrived Jan-22-2015 4 A 11:30:00 AM Thursday Missed May-15-2015 5 A 09:00:00 AM Friday Arrived May-15-2015 2 A 09:00:00 AM Friday Arrived May-15-2015 6 A 09:00:00 AM Friday Arrived May-22-2015 5 A 09:00:00 AM Friday Arrived May-22-2015 6 A 09:00:00 AM Friday Arrived May-22-2015 2 A 09:00:00 AM Friday Missed May-29-2015 5 A 09:00:00 AM Friday Arrived May-29-2015 6 A 09:00:00 AM Friday Arrived May-29-2015 2 A 09:00:00 AM Friday Missed

In the example above, there are 2 cohorts. Cohort 1 would be comprised of users 1,2,3,and 4 and nearly all of whom have attended the group offering (topic A) between Jan-08-2015 and Jan-22-2015. The same users attended Jan-15-2015 session as the Jan-08-2015 session and nearly all also attended Jan-22-2015 session.

The Cohort 2 (also for topic A) is comprised of users 2,5,6, with offering dates May 15 to 29th, 2015.

The Number of sessions is not set per offering, as it can change depending on demand, so I cannot factor in the number of sessions from the offering date.

I have looked at the Oracle/SQL: Split two inter-related lists into independent cohorts but the question remains unanswered.

Normally I would do the inspection visually and assign users to cohorts, but I have tens of thousands of rows, and was hoping there was a more efficient way to do this with SQL. I am running MSSQL 2014.

I have tried using OUTER APPLY with the table on itself but I am not really getting the result I am looking for.

Could you point me in the right direction please?

SQL:

SELECT src.UserID
  ,src.GroupSessionDate
  ,src.StartTime
  ,src.DayofWeek
  ,src.Topic
  ,prevsessdata.GroupSessionDate
  FROM GroupSessions src OUTER APPLY
  (SELECT TOP 1 * GroupSessions prevsd WHERE src.Topic=prevsd.Topic
  AND src.UserID=prevsd.UserID AND src.DayOfWeek=prevsd.DayOfWeek
  AND src.StartTime=prevsd.StartTime 
  AND prevsd.GroupSessionDate<src.GroupSessionDate) prevsessdata

EDIT: Assumptions that can be made:

  • sessions generally happen 1 week apart (exception: holidays)
  • all sessions in the same cohort would have the same topic, be on same day of the week, and same start time
  • users can stop attending but they would be listed in the sessions they missed - (sorry I missed this crucial detail in the original question!)
  • same users will generally be listed for all the sessions of group offering, although users can attend groups on multiple topics, so there may be instances where a user is listed also as being part of another cohort (here user 2 attended offering in Jan and in May). But this would be limited to 1 or 2 users of 10-12 in a cohort.

Desired output: For each Topic, StartTime, and DoW, list Cohort Start/End Dates.

Thanks!

  • What sort of output are you expecting? And, more importantly, how exactly do you figure out what is or isn't a cohort? e.g. you say 2,5,6 is a cohort but 2 only attends one of those group sessions in the second batch. Are there any assumptions that can be made (like, group sessions will always be the same topic, same day, same time a week apart if they are the same "cohort")? – ZLK Jan 20 '16 at 04:14
  • Sorry I missed this. I will edit the posting. – user2716667 Jan 20 '16 at 10:00

1 Answers1

0

I think you're looking for something like this:

; WITH T AS (
    SELECT CONVERT(DATE, REPLACE(GroupSessionDate, '-', ' ')) GroupSessionDate, Topic, StartTime, [DayOfWeek]
        , STUFF((SELECT ', ' + CONVERT(NVARCHAR, UserID)
            FROM GroupSessions
            WHERE GroupSessionDate = G.GroupSessionDate
            AND Topic = G.Topic
            AND StartTime = G.StartTime
            AND [DayOfWeek] = G.[DayOfWeek]
            ORDER BY UserID
            FOR XML PATH ('')), 1, 2, '') [Cohort]
    FROM GroupSessions G
    GROUP BY GroupSessionDate, Topic, StartTime, [DayOfWeek])
SELECT Cohort, MIN(GroupSessionDate) SessionStartDate, MAX(GroupSessionDate) SessionEndDate, Topic, StartTime, [DayOfWeek]
FROM T
GROUP BY Topic, StartTime, [DayOfWeek], Cohort, DATEDIFF(dd, 0, GroupSessionDate) % 7
ORDER BY MIN(GroupSessionDate)

A simple group by statement. Really, from what it looks like you're asking, all you need to do is group by topic, starttime, dayofweek and groupsessiondate to find each "cohort" then sort through that again to find the minimum / maximum dates for each of these.

If this isn't producing the right results, there's probably some other criterion you need to use to narrow it down a bit further.

ZLK
  • 2,864
  • 1
  • 10
  • 7
  • Thank you very much, this is exactly the result I was looking for! One question, I am not familiar with the expression: DATEDIFF(dd, 0, GroupSessionDate) % 7. Does it check if the DateDiff result divided by 7 returns 0? – user2716667 Jan 20 '16 at 23:28
  • It's just the modulo of the date's day (so it checks that the actual dates are on the same day). It probably isn't actually necessary here since days of the week are recorded in your table (in another table, you might use that to check the days of the week), but it can't hurt to make sure someone hasn't entered a day incorrectly at some point. – ZLK Jan 21 '16 at 01:02