3

I am new to SQL Server and this is my first question that I have ever posted to stackoverflow, hopefully, I am doing it, correctly.

Ok, lets say I have a table that has an ID, POSTEDDATE, and an AMOUNT field. I am trying to figure out using today's date and that our fiscal year is from July 1st to June 31st of every year. How would I create SQL that would give me ONLY the IDs that have given in consecutive FISCAL years, starting from TODAY'S DATE? And, it would ONLY give me records of people that have ONLY given CONSECUTIVELY (in other words, ONLY for 2 or more years?

So, for example, here is what I am referring to:

ID    POSTEDDATE    AMOUNT
1     6/15/2016     100.00
1     2/10/2015     5.00
2     6/15/2016     10.00
2     1/15/2016     50.00
3     1/10/2013     10.00
3     1/10/2012     60.00

Therefore, the ONLY record that I would get from this data set would be: ID ConsecYears 1 2

ID#2 gave 2 times in the SAME FISCAL YEAR and ID#3 gave in 2 consecutive years BUT it is not starting from TODAY'S DATE.

Does this make sense? If not, I can better clarify anything that is needed, please, just ask.

Thank you for any help, GrimReaper

Alex Kudryashev
  • 9,120
  • 3
  • 27
  • 36
GrimReaper
  • 43
  • 4

3 Answers3

1

Here's a solution that's pretty similar to @Matt's, but which performed much faster in my testing. I saw your comments that this would be done on a table that contains a lot of data, so I created a table in my database called dbo.Donation and loaded it with 666,667 copies of your sample data set, incrementing the ID values with each copy, for a total of 4,000,002 records. Then I shifted the PostedDate back by a year for every record with ID > 200000, which effectively limits the number of matching donors to 66,667 (one-tenth of the number of copies of your data set). I don't know how closely this matches the actual number of donors you expect to see returned from this query; I just picked a number that seemed reasonable.

On my machine, the query returned the correct result set in about two seconds. So I'm hopeful that it will work for you. The comments in the query explain how it works.

By the way, I also saw your comment that your boss doesn't want a CTE, and I should point out that there's nothing about the CTEs that I'm using that makes them intrinsically slower than an alternate formulation. In fact, it's trivial to write an alternate formulation that uses subqueries in place of CTEs, and when I did so, I found that both queries produced exactly the same execution plan. The CTE version is just easier to read—and therefore easier to maintain—since you can interpret it from the top down rather than from the inside out. Let me know if you have any questions.

-- Compute the current fiscal year.
declare @ThisFiscalYear int = year(getdate()) + case when month(getdate()) <= 6 then 0 else 1 end;

-- This CTE computes the fiscal year of each donation and gets the set of unique fiscal years in
-- which each donor (ID) made a contribution.
with IDYearComboCTE as
(
    select distinct
        D.ID,
        FiscalYear = year(D.PostedDate) + case when month(D.PostedDate) <= 6 then 0 else 1 end
    from
        dbo.Donation D
),

-- The second CTE produces two values for each fiscal year: 
--
--     1. FiscalYearOffset is the number of fiscal years that lie between the current fiscal year
--        and the fiscal year in which the donation was posted, inclusive. So a donation from the
--        current fiscal year will have FiscalYearOffset = 1, one from the previous fiscal year
--        will have FiscalYearOffset = 2, and so on.
--
--     2. FiscalYearIndex is a reverse chronological ordering of the unique fiscal years for each
--        donor. In other words, the most recent donation for every donor, regardless of how long
--        ago it was, will have FiscalYearOffset = 1.
--
-- The important point to realize here is that FiscalYearOffset and FiscalYearIndex will be the
-- same for a given donation IF AND ONLY IF the donor has posted a donation in that year and in
-- every subsequent year up to and including the current fiscal year. If the donor has skipped a
-- year, then FiscalYearOffset will increase by more than one while FiscalYearIndex increases by
-- only one (because its values are always contiguous), and if the donor hasn't given in the
-- current fiscal year, then the record with FiscalYearIndex = 1 will have FiscalYearOffset > 1.
--
FiscalYearOrderingCTE as
(
    select
        C.ID,
        FiscalYearOffset = @ThisFiscalYear - C.FiscalYear + 1,
        FiscalYearIndex = row_number() over (partition by C.ID order by C.FiscalYear desc)
    from
        IDYearComboCTE C
    where
        C.FiscalYear <= @ThisFiscalYear
)

-- Since, as described above, records with FiscalYearOffset = FiscalYearIndex represent contiguous
-- years in which the same donor made a contribution, we select only those records, then limit the
-- result set to those IDs with two or more records (i.e. with donations in two or more contiguous 
-- fiscal years starting with the current fiscal year). 
select
    O.ID,
    ConsecYears = max(FiscalYearIndex)
from
    FiscalYearOrderingCTE O
where
    O.FiscalYearOffset = O.FiscalYearIndex
group by
    O.ID
having
    max(FiscalYearIndex) >= 2;

Edit: I will try to explain a little more thoroughly what the second CTE is doing, since that's where the magic happens.

First, be sure that you understand SQL Server's row_number() function. Because of the partition by C.ID clause, the row numbers generated by this function start over at 1 for each set of records that have the same ID. And because of the order by C.FiscalYear desc clause, we know that for any two records that have the same ID, the record whose fiscal year is later will have a smaller FiscalYearIndex value. Also note that the records we're operating on are those selected by the previous CTE, IDYearComboCTE, and because that query uses distinct, no two records with the same ID will also have the same fiscal year. Therefore, for any set of records that have the same ID, if you were to arrange them in decreasing chronological order by FiscalYear, you'd find that their FiscalYearIndex values form a sequence of contiguous integers that always begins with the value 1.

Second, consider the formula that I'm using to produce FiscalYearOffset: I'm just subtracting fiscal years from a constant which is equal to the latest fiscal year that we expect to occur in the result setǂ, and then adding 1. Some important observations here:

  1. Because we are considering only donations that occurred in or prior to the current fiscal year, the lowest FiscalYearOffset value that can occur is 1, and it will occur precisely for donations that occurred in the current fiscal year.

  2. Because no two records from IDYearComboCTE having the same ID will have the same FiscalYear (as discussed above), it is also true that no two records with the same ID will have the same FiscalYearOffset.

  3. Because of point 2, and because a lower FiscalYear value will produce a higher FiscalYearOffset value, note that for any set of records with the same ID, if you were to arrange them in decreasing chronological order by FiscalYear, you'd find that their FiscalYearOffset values form a strictly increasing sequence. This is very similar to my observation on FiscalYearIndex above, with the important distinctions that the sequence of FiscalYearIndex values for a particular ID always starts with 1 and contains no gaps. Neither of those is true of a sequence of FiscalYearOffset values.

Now, suppose we have a record from IDYearComboCTE that represents the most recent donation from some particular donor. Since it's the most recent, we know that its FiscalYearIndex will be 1, because that's how row_number() works. We also know that its FiscalYearOffset will be 1 if and only if the donation occurred in the current fiscal year. If that most recent donation was from a previous fiscal year, then you can see by the formula for FiscalYearOffset that you'll get a value greater than 1. Therefore, for any donor's most recent donation, we know that FiscalYearOffset = FiscalYearIndex if and only if the donation was in the current fiscal year.

Next, consider a record from IDYearComboCTE that represents the second-most recent donation by some donor. Since it's the second-most recent, we know that its FiscalYearIndex will be 2, because again, that's how row_number() works. Crucially, we also know that its FiscalYearOffset value will be greater than or equal to 2. It cannot be 1 because this is the second-most recent donation year for the donor, and we said above that the same FiscalYear will not occur twice for the same ID in IDYearComboCTE. It will be 2 if the donation is from the previous fiscal year, and it will be some number greater than 2 if the donation is earlier than that.

Here's the critical piece: suppose that in that second-most recent donation record, FiscalYearOffset is greater than 2, meaning that it's greater than FiscalYearIndex. As we go to the third-most recent donation year, fourth-most recent, and so on, for that particular ID, FiscalYearOffset and FiscalYearIndex will never again be equal. This is because with each successive record, we know that the value of FiscalYearIndex will be one greater than the FiscalYearIndex of the previous record. Because of this, and because FiscalYearOffset > FiscalYearIndex for the second-most recent donation, in order for some further record's FiscalYearIndex to "catch up" to the FiscalYearOffset, FiscalYearOffset would have to increase by some number less than 1. But there is no positive integer less than 1, and we already showed above that the set of FiscalYearOffset values for a given ID is a strictly increasing sequence. For FiscalYearOffset to increase by less than 1 is thus impossible.

Therefore, the only records from the second CTE where FiscalYearOffset = FiscalYearIndex are those that represent donations in an unbroken sequence of years beginning with the current fiscal year and working backwards, and for a given ID, the number of donations in that unbroken sequence is equal to the number of records with that ID in the second CTE for which that equality holds.

This is a lot to take in. If it's still unclear, I'd recommend putting some sample data together and then changing the final select from my query to just select * from FiscalYearOrderingCTE so you can see the raw output from that second CTE.

ǂ—While I was writing this edit, I realized that my original query failed to account for the fact that your database might contain future-dated donations, particularly donations occurring after the current fiscal year. This would break the query as originally written, so I've revised it by including where C.FiscalYear <= @CurrentFiscalYear in the second CTE.

Joe Farrell
  • 3,502
  • 1
  • 15
  • 25
  • Thank you SOOOOOO much, Joe. That worked PERFECTLY!!!! I have going over and over the second CTE and trying to understand what it is doing but hopefully, I will understand it, soon. I guess what I was thinking is that you would have to group and then order the FiscalYearOrderingCTE in order to get the proper FiscalYearOffset because if it is not in order, then the years and IDs will be out of order and will not give you the proper consecutive years. So, I am trying to understand how the FiscalYearOffset and the FiscalYearIndex work with each other to give the correct, consecutive amt of years. – GrimReaper Jun 16 '16 at 15:59
  • @GrimReaper, I've updated my post with a greatly expanded explanation of the logic that went into `FiscalYearOrderingCTE`. I hope it helps you. If this query works for you, please do me a favor and accept the answer. :) – Joe Farrell Jun 16 '16 at 18:24
  • So, if I needed to change it so that it DOES NOT include the current fiscal year. In other words, it counts back STARTING from the prior fiscal year and I want records that have a consecutive count of 1 or greater (instead of 2). Would this be the changes that I would make? FiscalYearOffset = @ThisFiscalYear - C.FiscalYear, (remove the '+1') max(FiscalYearIndex) >= 1 (making it 1 instead of 2); – GrimReaper Jun 17 '16 at 20:47
  • Your proposed change to the `HAVING` clause at the end is correct. But rather than changing the formula for `FiscalYearOffset`, I would recommend that you change the value of `@ThisFiscalYear` to be the previous fiscal year instead of the current one (and rename that variable to more accurately describe its contents). That should be all you need. – Joe Farrell Jun 17 '16 at 21:27
  • Actually, `max(FiscalYearIndex)` will *always* be greater than or equal to 1, so if you wanted to, you could remove the clause entirely. – Joe Farrell Jun 17 '16 at 21:29
0

Here is a totally different way and it is a little more nasty because it has selects subselects etc. I wrote it with Common Table Expressions and without to give you and idea of the difference and structure. I got this technique off another post (Find the maximum consecutive years for each ID's in a table(Oracle SQL)) I just had to generate a Fiscal Year and use it with it.

I still think there is a much more elegant way with LAG and LEAD in face I liked the direction I was heading but I need to get back to other things. My guess is you may have to optimize this if you are in a larger data enviroment.

DECLARE @Revenue AS TABLE (Id INT, POSTEDDATE DATE, Amount Money)
INSERT INTO @Revenue (Id, POSTEDDATE, Amount)
VALUES (1,'6/15/2016',100.00)
,(1,'2/10/2015',5.00)
,(2,'6/15/2016',10.00)
,(2,'1/15/2016',50.00)
,(3,'1/10/2013',10.00)
,(3,'1/10/2012',60.00)
,(1,'2/10/2012',5.00)
,(1,'2/10/2011',5.00)
,(1,'2/10/2014',5.00)

DECLARE @CurrentFiscalYear INT
SET @CurrentFiscalYear = CASE
             WHEN MONTH(GETDATE()) < 7 THEN YEAR(GETDATE()) - 1
             ELSE YEAR(GETDATE())
       END


SELECT
    Id
    ,ConsecutiveYears = YearsDifference
FROM
    (
       SELECT
          a.Id
          ,AFY = a.FiscalYear
          ,BFY = b.FiscalYear
          ,YearsDifference = b.FiscalYear - a.FiscalYear + 1
          ,Ranking = DENSE_RANK() OVER (PARTITION BY a.id ORDER BY (b.FiscalYear - a.FiscalYear) DESC)
       FROM

          (SELECT FiscalYear = CASE
              WHEN MONTH(POSTEDDATE) < 7 THEN YEAR(POSTEDDATE) - 1
              ELSE YEAR(POSTEDDATE)
            END
            ,*
          FROM
            @Revenue)  a

          INNER JOIN 

          (SELECT FiscalYear = CASE
              WHEN MONTH(POSTEDDATE) < 7 THEN YEAR(POSTEDDATE) - 1
              ELSE YEAR(POSTEDDATE)
            END
            ,*
          FROM
            @Revenue)  b       

          ON a.Id = b.Id
          AND b.FiscalYear > a.FiscalYear
       WHERE
          (b.FiscalYear - a.FiscalYear) = (
             SELECT COUNT(*) - 1
             FROM

                (SELECT FiscalYear = CASE
                        WHEN MONTH(POSTEDDATE) < 7 THEN YEAR(POSTEDDATE) - 1
                        ELSE YEAR(POSTEDDATE)
                      END
                ,*
             FROM
                @Revenue)  a1

             WHERE a.Id = a1.Id AND a1.FiscalYear BETWEEN a.FiscalYear AND b.FiscalYear
             )
    ) final
WHERE
    Ranking = 1
    AND BFY = @CurrentFiscalYear


------ and organized as a common table expressions

;WITH cteRevenue AS (
    SELECT
       FiscalYear = CASE
             WHEN MONTH(POSTEDDATE) < 7 THEN YEAR(POSTEDDATE) - 1
             ELSE YEAR(POSTEDDATE)
       END
          ,*
       FROM
          @Revenue
)

, cteConsecutiveYears AS (
    SELECT
          a.Id
          --,AFY = a.FiscalYear
          ,BFY = b.FiscalYear
          ,YearsDifference = b.FiscalYear - a.FiscalYear + 1
          ,Ranking = DENSE_RANK() OVER (PARTITION BY a.id ORDER BY (b.FiscalYear - a.FiscalYear) DESC)
       FROM
          cteRevenue a
          INNER JOIN cteRevenue b      
          ON a.Id = b.Id
          AND b.FiscalYear > a.FiscalYear
       WHERE
          (b.FiscalYear - a.FiscalYear) = (
             SELECT COUNT(*) - 1
             FROM
                cteRevenue a1
             WHERE a.Id = a1.Id AND a1.FiscalYear BETWEEN a.FiscalYear AND b.FiscalYear
             )
) 

SELECT
    Id
    ,ConsecutiveYears = YearsDifference
FROM
    cteConsecutiveYears
WHERE
    Ranking = 1
    AND BFY = @CurrentFiscalYear
Community
  • 1
  • 1
Matt
  • 13,833
  • 2
  • 16
  • 28
  • Well, what I was referring to when I mentioned "from today", I was talking about a SELECT statement that I could use today's date to start going back from and since ID#3 is not within the current fiscal year, it would not be counted. So, since the fiscal year is from 7/1 to 6/30, and this fiscal year is FY2016 until 7/1 when it would be FY17. So, it would start from the current fiscal year and start counting "backwards" until there is a break and if it is larger than "1", then it would count as "consecutive". Thanks, Grim – GrimReaper Jun 15 '16 at 18:42
  • Got It, If you want to count from current fiscal year backwards you will first need to create a FiscalYear Table that has all of the FiscalYears you want and then calculate the fiscalyear column relate and start counting backwards but LAG will still be helpful. To reference the format I used it is Common Table Expressions (https://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx) and very poplular in sql-server these days it is a way to not have to write a sub select. I will edit my answer in light of this new information. – Matt Jun 15 '16 at 18:46
  • Actually, that was the first question that I asked! The last place that I worked had a fiscal year table but I was told that it should be able to be done through using SQL WITHOUT a fiscal year table because it does not exist in the schema that I am working in. – GrimReaper Jun 15 '16 at 18:48
  • Also, I will admit one more thing, I came from using Oracle for a few years so I am trying to get used to SQL Server, too. I have not used "with..." in Oracle before. But I will figured it out. Therefore, I tried what you had suggested, above, to see if it would work, but I am getting an error. The table name is "Revenue" and the columns are ID, POSTDATE, and AMOUNT. The error that I am getting is that it is a syntax error around the FROM statement. – GrimReaper Jun 15 '16 at 18:51
  • SQL that I tried, below: WITH cteFiscalYear AS ( SELECT FiscalYear = CASE WHEN MONTH(POSTDATE) < 7 THEN CAST(DATEFROMPARTS(YEAR(POSTDATE) - 1,7,1) AS DATETIME) ELSE CAST(DATEFROMPARTS(YEAR(POSTDATE),7,1) AS DATETIME) END ,* ) FROM REVENUE ) SELECT * ,ConsecutiveYearOrNot = IIF(LAG(FiscalYear,1,0) OVER (PARTITION BY Id ORDER BY FiscalYear) = DATEADD(YEAR,-1,FiscalYear), 1,0) ,IdRowNum = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY FiscalYear, POSTEDDATE) FROM cteFiscalYear – GrimReaper Jun 15 '16 at 18:52
  • Which SQL Version? It looks I put a ")" into what I copied on here by accident. ,* ) FROM in the cte definition should be ,* FROM. Also ";" in front of the with just terminates prevous SQL statements and is typically used when using Common Table Expressions. I can write it without that. I also have an easy way to generate a FiscalYear table on the fly so it doesn't have to be added to your schema and it is light weight. I will put that in the bottom of my post now. – Matt Jun 15 '16 at 19:05
  • Matt, I really appreciate the help. I am working with what you have given me and trying to get it to work. Already talked to the boss and he told me that since this script with work with millions of rows, I should not create the CTE (had to look that up) and ONLY work directly with the tables, themselves. When I tried to get the code to run using the CTE, I had to stop it after 5 minutes cause it never finished. Thank you for at least a start. Now I just have to figure out how to do it WITHOUT using a CTE or creating a fiscal year table. This is where I am trying to fix it from: – GrimReaper Jun 15 '16 at 20:16
  • WITH cteFiscalYear AS ( SELECT FiscalYear = CASE WHEN MONTH(POSTDATE) < 7 THEN YEAR(POSTDATE) ELSE YEAR(POSTDATE)+1 END ,* FROM REVENUE ) SELECT * ,ConsecutiveYearOrNot = IIF(LAG(FiscalYear,1,0) OVER (PARTITION BY Id ORDER BY FiscalYear) = DATEADD(YEAR,-1,FiscalYear), 1,0) ,IdRowNum = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY FiscalYear, POSTDATE) FROM cteFiscalYear; – GrimReaper Jun 15 '16 at 20:16
  • @GrimReaper FYI I totally went a different direction and posted an answer. I still like the LAG Idea as it is simpler and finds the consecutive years but to limit from the current fy backwards was the not as neat an answer. anyway, check out the newer solution – Matt Jun 15 '16 at 20:23
0

Here's the classic gaps and islands approach:

with
    y as (select distinct ID, year(dateadd(month, -6, POSTEDDATE)) as FY from T),
    r as (select ID, FY, row_number() over (partition by ID order by FY desc) as RN from y)
select ID, min(FY), max(FY)
from r
group by ID, FY + RN
having max(FY) = year(dateadd(month, -6, getdate())) and count(*) > 1;
  1. Get a list of distinct IDs and fiscal years.
  2. Number the rows, resetting per ID. I used descending order since you're probably thinking about it this way but it really doesn't matter.
  3. Use the gaps and islands grouping to find the consecutive blocks of years. Because of the descending sort it uses FY plus RN instead of subtraction. Keep only the block anchored to current fiscal year when it also has more than one year in the span.

http://rextester.com/LNL41306

Optimized approach

You've commented on the size of the table and how performance might be an concern. It kind of depends on the nature of your data but I'm supposing that you may not have a lot of repeat donors and so culling the list of potential donors early might be a good way to improve it. I see two likely possibilities here: first find donors for the current fiscal year and first find repeat donors within the last two fiscal years. When you're early in the fiscal year the first one is probably the better choice.

with
    f1 as (
        select ID from T
        where POSTEDDATE >= datefromparts(year(dateadd(month, -6, getdate())), 7, 1)
    )
    f2 as (
        select ID from T
        where POSTEDDATE >= datefromparts(year(dateadd(month, -6, getdate())) - 1, 7, 1)
        group by ID having count(distinct year(dateadd(month, -6, POSTEDDATE))) = 2
    ),
    y as (
        select distinct ID, year(dateadd(month, -6, POSTEDDATE)) as FY from T
        where ID in (select ID from fN) -- choose f1 or f2
    ), 
    r as (
        select ID, FY, row_number() over (partition by ID order by FY desc) as RN from y
    )
select ID, min(FY), max(FY)
from r
group by ID, FY + RN
having max(FY) = year(dateadd(month, -6, getdate())) and count(*) > 1;

If you don't have datefromparts() you could use dateadd(month, -6, cast(year(getdate()) as date)) and dateadd(month, -6, cast(year(getdate()) - 1 as date)) or some other equivalent expressions for the start of fiscal year.

I was trying to think of a way to get it to take advantage of an index on ID and POSTEDDATE but I don't think SQL Server knows that extracting the year from the translated date won't have changed the ordering. If you have dense_rank() available you might try combining the first two steps and eliminating y completely. By the way y was supposed to mean "years" and r is "ranking".

with r as (
    select distinct ID, year(dateadd(month, -6, POSTEDDATE)) as FY,
        dense_rank() over
            (partition by ID order by year(dateadd(month, -6, POSTEDDATE)) desc) as RN
    from T
) ...

Obsolete (incorrect) approach

Below is the answer I originally wrote for getting results within the last two fiscal years. I re-read the question and realized it didn't address the full problem. It could still be helpful though:

select
    ID,
    /* the rest of these values are just for verification */
    min(year(dateadd(month, -6, POSTEDDATE))) as FY1,
    max(year(dateadd(month, -6, POSTEDDATE))) as FY2,
    min(POSTEDDATE) as firstDate,
    max(POSTEDDATE) as lastDate
from T
where
        /* look at two most recent fiscal years */
        year(dateadd(month, -6, POSTEDDATE)) between
            year(dateadd(month, -6, getdate())) - 1 and year(dateadd(month, -6, getdate()))
        /* sargable filter - never need more than two full years plus one leap day */
    and POSTEDDATE >= dateadd(day, -731, getdate())
group by ID
having count(distinct year(dateadd(month, -6, POSTEDDATE))) = 2;

This query says:

  1. Give me all rows starting 731 days ago and forward. It does not seem to me that you can have anything future-dated but the between in the next step handles it if you do. You could calculate a tighter upper and lower bound if performance is an issue.

  2. Adjust dates backward by six months and strip out the year to calculate the fiscal year number. Only keep the rows that fall into the fiscal year of today's date or the one prior.

  3. Group the results and keep IDs with two distinct fiscal years per group.

shawnt00
  • 16,443
  • 3
  • 17
  • 22