0

I have a rollup table that sums up raw data for a given hour. It looks something like this:

stats_hours:
- obj_id : integer
- start_at : datetime
- count : integer

The obj_id points to a separate table, the start_at field contains a timestamp for the beginning of the hour of the data, and the count contains the sum of the data for that hour.

I would like to build a query that returns a set of data per day, so something like this:

Date       | sum_count
2014-06-01 | 2000
2014-06-02 | 3000
2014-06-03 | 0
2014-06-04 | 5000

The query that I built does a grouping on the date column and sums up the count:

SELECT date(start_at) as date, sum(count) as sum_count 
FROM stats_hours GROUP BY date;

This works fine unless I have no data for a given date, in which case it obviously leaves out the row:

Date       | sum_count
2014-06-01 | 2000
2014-06-02 | 3000
2014-06-04 | 5000

Does anyone know of a good way in SQL to return a zeroed-out row in the case that there is no data for a given date group? Maybe some kind of case statement?

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
kid_drew
  • 3,857
  • 6
  • 28
  • 38
  • 3
    Which database platform are you using? – Justin Niessner Jul 02 '14 at 19:07
  • PSQL this time, but I'd love a solution that works for MySQL too. – kid_drew Jul 02 '14 at 19:09
  • You'll need to build out a list of dates in one way or another, and then left join over to the results you have here. My preference is to have a 'dim_date' table or something to that extent that simply stores a date as a single row (1 row for every date). Select from the dim_date table, and left join over to the results you have here...left join will ensure that date with 0 rows returns with a null (use null handling to return a 0 instead of a null if needed) – Twelfth Jul 02 '14 at 19:09

1 Answers1

4

You need a full list of dates first, then connect that list to your available dates and group by that. Try the following:

--define start and end limits
Declare @todate datetime, @fromdate datetime
Select @fromdate='2009-03-01', @todate='2014-06-04'

;With DateSequence( Date ) as
(
    Select @fromdate as Date
        union all
    Select dateadd(day, 1, Date)
        from DateSequence
        where Date < @todate
)

--select result
SELECT DateSequence.Date, SUM(Stats_Hours.Count) AS Sum_Count
FROM 
    DateSequence 
     LEFT JOIN 
    Stats_Hours ON DateSequence.Date = Stats_Hours.Start_At
GROUP BY DateSequence.Date
option (MaxRecursion 0)

EDIT: CTE code from this post

AHiggins
  • 7,029
  • 6
  • 36
  • 54
  • +1...same idea as the comment I made, this builds the data table on the fly and then left joins...if you prefer, you can have a static table hold these dates instead. – Twelfth Jul 02 '14 at 19:17
  • 1
    Agree with @Twelfth, a static table is a better option ... but, recognizing that not everyone has control over their DB, and might not be willing to go through the necessary hoops to add the new table, the CTE option is usually what I post. If you can, though, it's great to have a date table and/or numbers table as part of your DB. – AHiggins Jul 02 '14 at 19:19
  • I'm curious why you both think a static table is the better option. What's your logic? Lack of subquery, so better performance? – kid_drew Jul 02 '14 at 19:28
  • A static table is available for all queries on a given database or server, whereas a CTE (Common Table Expression) is limited to a single query. If you want to write two or three stored procedures, you have to make each one include the code, and so forth. A table, on the other hand, is permanent and can be indexed, and could have other data added (for example, a table with a list of dates could have a flag for weekends, or something similar). – AHiggins Jul 02 '14 at 19:30
  • Here's a good post on the advantages of a Date table - http://stackoverflow.com/questions/21312739/what-is-the-advantage-of-using-a-date-dimension-table-over-directly-storing-a-da – AHiggins Jul 02 '14 at 19:31
  • Performance and customization are the two big ones...the CTE here isn't that CPU intensive, but if you have 50 queries all building the same CTE, then it's far quicker to have the table. I also like it for standardizing...a common date table means all your developers are using the same logic...leaving each developer to their own CTE often results in multiple methods employed. – Twelfth Jul 02 '14 at 20:35