0

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.

Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).

SELECT
    ReportId,
    TransactionId,
    CurrentProductionHours,
    ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
                       ORDER BY TransactionTimestamp desc
                      ) AS TransactionByMonth
FROM
    tblSource

The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.

Source Data:

ReportId       TransactionTimestamp     CurrentProductionHours
1              2014-01-05 13:37:00      14.50
1              2014-01-20 09:15:00      15.00
1              2014-01-21 10:20:00      10.00
2              2014-01-22 09:43:00      22.00 
1              2014-02-02 08:50:00      12.00

Target Results:

ReportId     Month     Year     ProductionHours
1            1         2014     10.00
2            1         2014     22.00
1            2         2014     12.00
2            2         2014     22.00

I should also mention that I have a date table available, which can be referenced if required.

** UPDATE 05/03/2014 **

I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?

ReportId     Month     Year     ProductionHours
1            1         2014     10.00
1            2         2014     12.00
1            3         2014     NULL
2            1         2014     22.00
2            2         2014     NULL
2            3         2014     NULL 

Any advice about how to tackle this would be greatly appreciated!

triplestones
  • 393
  • 2
  • 13

2 Answers2

0

Try this:

;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp, 
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId, 
       month(dateadd(m, -1, c.TT)) Month, 
       year(dateadd(m, -1, c.TT)) Year, 
       x.CurrentProductionHours 
from c
cross apply
(select top 1 CurrentProductionHours from tblSource 
where TransactionTimestamp < c.TT 
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
t-clausen.dk
  • 43,517
  • 12
  • 59
  • 92
0

A similar approach but using a cartesian to obtain all the combinations of report ids/months. in the first step. A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row. Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.

    ;
WITH    allcombinations -- Cartesian (reportid X yearmonth) 
      AS ( SELECT   reportid ,
                    yearmonth
           FROM     ( SELECT DISTINCT
                                reportid
                      FROM      tblSource
                    ) a
                    JOIN ( SELECT DISTINCT
                                    DATEPART(yy, transactionTimestamp)
                                    * 100 + DATEPART(MM,
                                                     transactionTimestamp) yearmonth
                           FROM     tblSource
                         ) b ON 1 = 1
         ),
    maxdates --add correlated max timestamp where the month is less or equal to the month in current record
      AS ( SELECT   a.* ,
                    ( SELECT    MAX(transactionTimestamp)
                      FROM      tblSource t
                      WHERE     t.reportid = a.reportid
                                AND DATEPART(yy, t.transactionTimestamp)
                                * 100 + DATEPART(MM,
                                                 t.transactionTimestamp) <= a.yearmonth
                    ) maxtstamp
           FROM     allcombinations a
         )
-- join previous data to the source table by reportid and timestamp 
SELECT  distinct m.reportid ,
        m.yearmonth ,
        t.CurrentProductionHours
FROM    maxdates m
        JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
        m.yearmonth
Jayvee
  • 10,670
  • 3
  • 29
  • 40
  • I tested your solution and it doesn't give the right result with more test rows – t-clausen.dk Mar 04 '14 at 15:22
  • oops, too bad; can I have the test set you used?; I can try and fix my query. thanks. – Jayvee Mar 04 '14 at 15:29
  • All you have to do it create a table like in the question and add rows to it. It fails pretty often. – t-clausen.dk Mar 04 '14 at 15:37
  • I added a distinct and the report id condition to the join in case of duplicated timestamps – Jayvee Mar 04 '14 at 16:19
  • no... fixing a failed query is rarely done with distinct. Last time I checked it was still wrong. And I already had tested joining on reportid. Distinct will not fix it. – t-clausen.dk Mar 04 '14 at 17:08
  • it seems that you have some scenarios in your test data that i'm omitting because with my test set I get the same results using both queries, yours and mine. – Jayvee Mar 04 '14 at 17:15
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/48979/discussion-between-t-clausen-dk-and-jayvee) – t-clausen.dk Mar 04 '14 at 17:16