7

I am a school teacher very new to MS SQL server. Everyone is suggesting to try this site out. Here goes!

I am trying to write queries to test different types of outcome measures for the participation in a academic program. There are several different ways to calculate this outcome measurement I would like to try. The outcome which I am trying to Calculate is: What is the % of participants that are retained during six months of the program? I am testing different ways to define participant and different time ranges. There are 4 queries I am trying to produce. Unfortunately, I have to use for different tables: Attendance, Status, Deenrolled, Inactive. I have included sample data from each below

Queries

  1. A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at July 1st 2012 and Ending June 30th 2013, so the length of the fiscal year.If a participant is deenrolled or inactive they are dropped.
  2. A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at January 1st 2013. If a participant is deenrolled or becomes inactive they are dropped.
  3. A participant is defined as everyone that attended a class at least twice a week starting at January 1st 2013 until today
  4. A participant is defined as a student's enrollment start date until they are deenrolled or become inactive.

Participant (Numerator) participant / all students which were served (Denominator)

The 4 query outputs I am looking for are different versions of this:

Example

Participants    Served   Percent_Served
75               100        75%      

I have been messing around with different versions of the query below

 SELECT 
Count (distinct ID) as Count, 
  Count  ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) > 2 as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive) 
or ID not in (select ID from Deenrolled) 
GROUP BY ID

and

 SELECT 
Count (distinct ID) as Count, 
  Count  ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) - Enrolled_Date  as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive) 
or ID not in (select ID from Deenrolled) 
GROUP BY ID

Any programming assistance for these queries is greatly appreciated.

Below are the sample/example datasets.

Attendence_date is the date a student participated in one class.

CREATE TABLE Attendance (
    ID int,
    Attendence_date datetime
    )

INSERT INTO Attendance VALUES 
(4504498,  '7/1/2012'),
(4504498,  '7/2/2012'),
(4504498,   '7/3/2012'),
(4504498,   '7/4/2012'),
(4504498,   '7/5/2012'),
(4504498,   '7/8/2012'),
(4504498,   '7/9/2012'),
(4504498,   '7/10/2012'),
(4504498,   '7/11/2012'),
(4504498,   '7/12/2012'),
(4504498,   '7/1/2012'),
(4504498,   '7/2/2012'),
(4504498,   '7/3/2012'),
(4504498,   '7/4/2012'),
(4504498,   '7/5/2012'),
(4504498,   '7/8/2012'),
(4504498,   '7/9/2012'),
(4504498,   '7/10/2012'),
(4504498,   '7/11/2012'),
(4504498,   '7/12/2012'),
(9201052,   '7/15/2012'),
(9201052,   '7/16/2012'),
(9201052,   '7/17/2012'),
(9201052,   '7/17/2012'),
(9201052,   '7/18/2012'),   
(7949745,   '7/17/2012'),   
(7949745,   '7/18/2012'),
(7949745,   '7/23/2012'),   
(7949745,   '7/23/2012'),   
(7949745,   '7/24/2012'),
(7949745,   '7/26/2012'),
(7949745,   '7/26/2012'),   
(7949745,   '8/8/2012'),    
(7949745,   '8/8/2012'),    
(7949745,   '11/5/2012'),   
(7949745,   '11/5/2012'),   
(7949745,   '11/5/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/7/2012'),   
(7949745,   '11/7/2012'),   
(7949745,   '11/7/2012')

Here is the contains the enrollment date.

CREATE TABLE [Status] (
    ID int,
    Intake_Date datetime ,
   Engaged_Date datetime ,
   Enrolled_Date datetime)
INSERT INTO [Status] VALUES 
(7949745, '3/7/2012',   '7/17/2012', '3/8/2012'),
(4504498, '2/21/2013',  '3/5/2013',  '3/22/2013'),
(1486279, '4/18/2013',  '5/7/2013',   '5/20/2013'),
(9201052, '5/15/2012',  '7/13/2012',  '5/15/2012'),
(1722390, '3/5/2012',   '8/27/2012', '3/8/2012'),
(7735695, '9/7/2012',   '9/7/2012',  '9/28/2012'),
(9261549, '3/7/2012',   '7/24/2012', '3/8/2012'),
(3857008, '3/15/2013',  '3/18/2013', '4/3/2013'),
(8502583, '3/14/2013',     '4/15/2013', '5/3/2013'),
(1209774,  '4/19/2012',  '1/1/2012',   '4/24/2012') 

Here is the contains the de-enrollment date.

CREATE TABLE Deenrolled (
    ID int,
    Deenrolled_Date datetime)
INSERT INTO Deenrolled  VALUES 
(7949745,    '2/4/2013'),
(5485272,    '07/08/2013'),
(8955628,    '01/10/2013'),
(5123221,    '7/8/2013'),
(5774753,    '7/18/2013'),
(3005451,    '2/18/2013'),
(7518818,    '05/29/2013'),
(9656985,    '6/20/2013'),
(2438101,    '7/17/2013'),
(1437052,    '7/25/2013'),
(9133874,    '4/25/2013'),
(7007375,    '6/19/2013'),
(3178181,    '5/24/2013')

And inactive

CREATE TABLE Inactive (
    ID int,
   Effect_Date datetime)
INSERT INTO Inactive VALUES 
(1209774,       '10/12/2012'),
(5419494,       '10/12/2012'),
(4853049,       '10/9/2012'),
(1453678,       '5/23/2013'),
(1111554,       '7/16/2012'),
(5564128,       '2/15/2013'),
(1769234,       '7/16/2012')
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
Matty34
  • 85
  • 5
  • 8
    Welcome, a couple tips: People like to see what you've tried so far, what's not working (errors you're receiving or sample output that's not as you desire). Sample desired output is also helpful, as it's often easier to understand data than description of data. You can also use SQL Fiddle to set up your sample table(s) for people to query: http://sqlfiddle.com/#!3 – Hart CO Jul 29 '13 at 20:40
  • @steoleary I am very new, so so I have been looking for examples online. I just added in some of the code I have been trying to pecie togther. – Matty34 Jul 29 '13 at 20:56
  • @goatco thank you. I will check it out. I am very new and I have been looking throw a lot of eamples online. – Matty34 Jul 29 '13 at 21:02
  • 1
    This question needs a better title, one that reflects the programming question, not the problem domain. –  Jul 29 '13 at 22:23
  • what version of SQL Server do you work on? – Roman Pekar Jul 30 '13 at 11:34

2 Answers2

1

Give these a shot (changed because I missed a huge part of the question)

    SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '07/01/2012', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '07/01/2012',Attendence_date) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '07/01/2012' AND '06/30/2013'
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');

SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '01/01/2013',Attendence_date) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '01/01/2013' AND '06/30/2013'
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');

SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '01/01/2013',GetDate()) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '01/01/2013' AND GetDate()
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', GetDate());

SELECT DISTINCT(Attendance.ID)
  FROM Attendance
  WHERE Attendance.ID NOT IN (SELECT ID FROM Deenrolled)
      AND ID NOT IN (SELECT ID FROM Inactive);

and an sqlfiddle to help you out: http://sqlfiddle.com/#!6/97230/3 Good luck!

Barbara Laird
  • 12,599
  • 2
  • 44
  • 57
  • What about at least twice a week FOR A SIXTH MONTH condition? – Roman Pekar Jul 30 '13 at 03:21
  • @Roman - The 2nd query is doing that since the OP had a start date of 1/1/2013 for that case. I just did the math to come up with 6/30/2013 as 6 months out. You could replace '06/30/2013' with DATEADD(MONTH, 6,'01/01/2013'), but why make the query do the math when it'll always be a constant. – Barbara Laird Jul 30 '13 at 15:31
  • well, correct me if I'm wrong, but at least twice a week for a sixth months means every week of sixth months? AFAIK, your query will return ID even if he/she attends only one week more than one time? – Roman Pekar Jul 30 '13 at 15:39
  • take a look at my answer :) – Roman Pekar Jul 30 '13 at 18:52
1

Well I should say that is not an easy one. The main problem was to solve 'at least twice a week for a sixth months' part - it's easy to calculate twice a week, but it should be 6 continous months!

While I've tried to solve it, I've found absolutely brilliant answer by Niels van der Rest - Finding continuous ranges in a set of numbers. So I'll give you general query for the Part 1, you can change parameters and get result for Part 2:

declare @Weeks int, @PerWeek int, @StartDate date, @EndDate date, @count

select
    @StartDate = '20120701',
    @EndDate = '20130630',
    @Weeks = 26, -- 6 month or 26 weeks
    @PerWeek = 2 -- twice per week

select @count = count(distinct A.ID)
from Attendance as A
where
    A.Attendence_date between @StartDate and @EndDate and
    A.ID not in (select T.ID from Deenrolled as T) and
    A.ID not in (select T.ID from Inactive as T)

;with CTE as (
    -- Week numbers, filter by dates
    select
        A.ID,
        datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
    from Attendance as A
    where
        A.Attendence_date between @StartDate and @EndDate and
        A.ID not in (select T.ID from Deenrolled as T) and
        A.ID not in (select T.ID from Inactive as T)
  ), CTE2 as (
    -- Group by week, filter less then @PerWeek per week, calculate row number
    select
        Wk, ID,
        row_number() over (partition by ID order by Wk) as Row_Num
    from CTE
    group by Wk, ID
    having count(*) >= @PerWeek
)
-- Final query - group by difference between week and row_number
select 100 * cast(count(distinct ID) as float) / @count
from CTE2
group by ID, Wk - Row_Num
having count(*) >= @Weeks

I've created SQL FIDDLE EXAMPLE, you can test the query.

Part 3 is easy

declare @PerWeek int, @StartDate date

select
    @StartDate = '20130101',
    @PerWeek = 2 -- twice per week

select @count = count(distinct A.ID)
from Attendance as A
where
    A.Attendence_date >= @StartDate and
    A.ID not in (select T.ID from Deenrolled as T) and
    A.ID not in (select T.ID from Inactive as T)

;with CTE as (
    -- Week numbers, filter by dates
    select
        A.ID,
        datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
    from Attendance as A
    where
        A.Attendence_date >= @StartDate and
        A.ID not in (select T.ID from Deenrolled as T) and
        A.ID not in (select T.ID from Inactive as T)
  ), CTE2 as (
    -- Group by week, filter less then @PerWeek per week
    select distinct ID
    from CTE
    group by Wk, ID
    having count(*) >= @PerWeek
)
select 100 * cast(count(*) as float) / @count from CTE2

Part 4 seems a bit unclear for me, could you clarify?

Community
  • 1
  • 1
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197