0

I am a newbie to SQL Server with a rather complex SQL task and none of the solutions I have found so far is fit for my problem below.

I have a database table that currently holds membership joining information (extract below) with one line for each year/period that a member have paid for membership; It is possible for a member to cancel their membership for one year/period and rejoin at a later date. We need to represent each continuous membership as one line such contents of Extract 1 below becomes Extract 2:

Extract 1

Member_No    Start_Date                 End_Date
---------    ----------                 --------
10       2010-01-01 00:00:00.000    2011-01-01 00:00:00.000
10       2011-01-01 00:00:00.000    2012-01-01 00:00:00.000
10       2012-01-01 00:00:00.000    2013-01-01 00:00:00.000
10       2013-01-01 00:00:00.000    2014-01-01 00:00:00.000
20       2005-01-01 00:00:00.000    2006-01-01 00:00:00.000
20       2006-01-01 00:00:00.000    2007-01-01 00:00:00.000
20       2007-01-01 00:00:00.000    2008-01-01 00:00:00.000
30       2005-01-01 00:00:00.000    2006-01-01 00:00:00.000
30       2006-01-01 00:00:00.000    2007-01-01 00:00:00.000
30       2007-01-01 00:00:00.000    2008-01-01 00:00:00.000
30       2008-01-01 00:00:00.000    2009-01-01 00:00:00.000
30       2009-01-01 00:00:00.000    2010-01-01 00:00:00.000
30       2010-10-13 00:00:00.000    2011-01-01 00:00:00.000
30       2011-01-01 00:00:00.000    2012-01-01 00:00:00.000
30       2012-01-01 00:00:00.000    2013-01-01 00:00:00.000

I need to replace contents of the above table with the below - there are many more records in the table in question and I would greatly appreciate any assistant anyone can offer:

Extract 2

Member_No    Start_Date                 End_Date
---------    ----------                 --------
10       2010-01-01 00:00:00.000    2014-01-01 00:00:00.000
20       2005-01-01 00:00:00.000    2008-01-01 00:00:00.000
30       2005-01-01 00:00:00.000    2010-01-01 00:00:00.000
30       2010-10-13 00:00:00.000    2013-01-01 00:00:00.000
Justin
  • 9,634
  • 6
  • 35
  • 47
Babs
  • 195
  • 2
  • 3
  • 8

1 Answers1

0

Here is a better approach (original approach still below):

with ms as (
      select ms.*,
             (select 1 from Membership ms2 where ms2.Member_no = ms.Member_no and cast(ms2.Start_date as date) = cast(ms.End_date as date)
             ) as LinkedToNext
      from MemberShip ms
     )
select member_no, MIN(start_date) as start_date, MAX(end_date) as end_date
from (select ms.*,
             (select top 1 end_date
              from ms ms2 where ms2.Member_no = ms.Member_No and ms2.LinkedToNext is NULL and ms2.Start_Date >= ms.Start_Date
             order by end_date desc
             ) as grouping
      from ms
     ) ms1
group by Member_no, grouping

The CTE determines if something is linked to the next. The grouping is then the end date of the first record afterwoards that is not linked to the next for a linked chain, this is always the same.

In response to your comment about "more than one row". This means that a member has two memberships starting on the same date. If this is the only form of overlap, you can fix this as:

with ms as (
      select member_no, start_date, max(end_date) as end_date
             (select 1 from Membership ms2 where ms2.Member_no = ms.Member_no and cast(ms2.Start_date as date) = cast(ms.End_date as date)
             ) as LinkedToNext
      from MemberShip ms
      group by member_no, start_date
     )

Much simpler than my original approach, which I'm still keeping below:

with ms as (
      select ms.*,
             (select 1 from Membership ms2 where ms2.Member_no = ms.Member_no and ms2.Start_date = ms.End_date
             ) as LinkedToNext,
             (select ms2.End_Date from Membership ms2 where ms2.Member_no = ms.Member_no and ms2.Start_date = ms.End_date
             ) as NextEndDate,
              (select 1 from Membership ms2 where ms2.Member_no = ms.Member_no and ms2.End_date = ms.Start_date
              ) as LinkedToPrev
      from MemberShip ms
     )
select member_no, MIN(start_date) as start_date,
       MAX(coalesce(NextEndDate, End_Date)) as  end_date
from (select ms.*,
             (ROW_NUMBER() over (partition by Member_no order by Start_Date) -
              ROW_NUMBER() over (partition by Member_no, LinkedToNext order by Start_date)
             ) as grouping
      from ms
     ) ms1
where not (LinkedToNext is null and LinkedToPrev = 1)
group by member_no, grouping

This uses a correlated subquer to determine if one membership is linked to the next (based on the start date being the same as the end date --you can actually add in a fudge factor, if you like).

Then it uses a trick. It enumerates the rows by start_date for each member. It also enumerates the rows by start date for each member by whether or not they are linked to the next. The difference is constant for a chain that can be grouped together.

The final step is to group this to get the final result.

The one complication is getting the right end date. The initial code does not include the end of the chain in the calculation. So, I borrow the end date from the next record. It then filters out this ending record.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hey Gordon, thanks for your response. The result of the first post is closer to what I am trying to achieve; however, it contains extra rows - could you please help with an updated SQL to remove rows 2, 4, 7 and 8? (result from running your sql against provided data in my original post below) `member_no start_date end_date 10 2010-01-01 2014-01-01 10 2013-01-01 2014-01-01 20 2005-01-01 2008-01-01 20 2007-01-01 2008-01-01 30 2005-01-01 2010-01-01 30 2010-10-13 2013-01-01 30 2009-01-01 2010-01-01 30 2012-01-01 2013-01-01` – Babs Apr 04 '13 at 14:46
  • @Babs . . . My bad. I had left out the `order by` in the second correlated subquery. – Gordon Linoff Apr 04 '13 at 15:44
  • Sorry to be a pain - the "better approach" does work better. However I get the error below when I run the query on the entire database/on more than 925 rows of data: `Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.` The error above appears to be due to not using 'JOIN' in the solution. I have tried at no avail to fix this - can you please help? – Babs Apr 04 '13 at 16:54
  • @Babs . . . That means you have overlapping memberships for a given member. You might want to ask another question about that condition. It makes the problem substantially different from your current problem (unless the simple fix in my answer suffices). – Gordon Linoff Apr 04 '13 at 17:22
  • The latest solution works fine however when I introduce an additional column, it doesn't. Can you please help with what changes I'd need to make to allow result to also be grouped by e.g. membership type/have result as below? – Babs Jul 17 '13 at 13:53
  • Can you please help with what changes I'd need to make to allow result to also be grouped by e.g. membership type/have result as below? Member_No Type Start_Date End_Date --------- --------- ---------- -------- 10 SENIOR 2010-01-01 2014-01-01 30 JUNIOR 2005-01-01 2007-01-01 30 SENIOR 2007-10-13 2010-01-01 30 SENIOR 2010-10-13 2013-01-01 – Babs Jul 17 '13 at 14:00