Counts on Date Range and Individual Days

Question

I'm having to re-write a project that was done using a combination of SQL queries and Query-of-Queries in ColdFusion. There were dozens of queries referencing the original SQL Query results set, but it wasn't abstracted to work for different events. So I would like to improve it by moving most of the counting into SQL. I got the first 6 counts they need working (not sure if in the most optimal way). But, in addition to those, I need to be able to do a break down not just on the overall date range, but also for each individual day in that date range for the unique counts.

So far the query is:

 SELECT Count(CASE
           WHEN type IN ( 1, 3, 4, 5, 9 ) THEN barcode
           ELSE NULL
         END)              AS total_scans,
   Count(CASE
           WHEN type IN ( 2, 8 ) THEN barcode
           ELSE NULL
         END)              AS total_creds,
   Count(barcode)          AS total_scans,
   Count(DISTINCT CASE
                    WHEN type IN ( 1, 3, 4, 5, 9 ) THEN barcode
                    ELSE NULL
                  END)     AS unique_scans,
   Count(DISTINCT CASE
                    WHEN type IN ( 2, 8 ) THEN barcode
                    ELSE NULL
                  END)     AS unique_creds,
   Count(DISTINCT barcode) AS unique_scans 
FROM   (SELECT c.id,
           a.barcode,
           d.type,
           c.location,
           Datepart(mm, a.scan_time)     AS scan_month,
           Datepart(dd, a.scan_time)     AS scan_day,
           Datepart(hour, a.scan_time)   AS scan_hour,
           Datepart(minute, a.scan_time) AS scan_min
    FROM   [scan_11pc_gate_entries] AS a
           INNER JOIN scan_units AS b
                   ON a.scanner = b.id
           INNER JOIN scan_gates AS c
                   ON b.gate = c.id
           INNER JOIN [scan_11pc_gate_allbarcodes] AS d
                   ON a.barcode = d.barcode
    WHERE  ( c.id IN (SELECT id
                      FROM   scan_gates
                      WHERE  ( event_id = 21 )) )
           AND ( a.valid IN ( 1, 8 ) )
           AND a.scan_time >= '20110808'
           AND a.scan_time <= '20110814') data

What is the data type of `scan_11pc_gate_entries.scan_time`? And do you find meaningless aliases like `a`,`b`,`c` etc. readable? — Aaron Bertrand, Mar 29 '13 at 19:38
I didn't write the inner query that gets the core data, the data type is datetime. And I suppose I could re-write the inner query as well, but I really just need the counts — pixelwiz, Mar 29 '13 at 20:04
Well the easiest way to get counts by day will require at least a slight change to the inner query, unless you want to group by these separated day/month/year columns. Did you try my answer? — Aaron Bertrand, Mar 29 '13 at 20:07

Aaron Bertrand · Accepted Answer · 2018-11-11T14:11:49.107

0

Well I see plenty of other room for improvement / optimization here, but to satisfy the immediate requirement:

 SELECT d, Count(CASE WHEN type IN ( 1, 3, 4, 5, 9 ) THEN barcode
           ELSE NULL END) AS total_scans,
   Count(CASE WHEN type IN ( 2, 8 ) THEN barcode
           ELSE NULL END) AS total_creds,
   Count(barcode)         AS total_scans,
   Count(DISTINCT CASE WHEN type IN ( 1, 3, 4, 5, 9 ) THEN barcode
           ELSE NULL END) AS unique_scans,
   Count(DISTINCT CASE WHEN type IN ( 2, 8 ) THEN barcode
           ELSE NULL END) AS unique_creds,
   Count(DISTINCT barcode) AS unique_scans 
FROM   (SELECT c.id, -- is this column necessary?
           a.barcode,
           d.type,
           c.location, -- is this column necessary?
           d = DATEADD(DAY, DATEDIFF(DAY, 0, a.scan_time), 0)
    FROM   [scan_11pc_gate_entries] AS a
           INNER JOIN scan_units AS b
                   ON a.scanner = b.id
           INNER JOIN scan_gates AS c
                   ON b.gate = c.id
                  AND c.event_id = 21 -- join criteria, 
                     -- shouldn't be an extra IN clause
           INNER JOIN [scan_11pc_gate_allbarcodes] AS d
                   ON a.barcode = d.barcode
    WHERE  ( a.valid IN ( 1, 8 ) )
           AND a.scan_time >= '20110808'
           AND a.scan_time <= '20110814') data  
GROUP BY d
ORDER BY d;

Note that >= and <= is the same as BETWEEN, and unless scan_time is a DATE column (or guaranteed to always be at midnight), the approach you're using isn't safe. Better to say:

           AND a.scan_time >= '20110808'
           AND a.scan_time <  '20110815'

More info here:

What do BETWEEN and the devil have in common?

EDIT

Understanding that this is based on the information I've pieced together from comments and questions, which is far from a complete picture of your environment and workload, an indexed view you may find useful would be:

CREATE VIEW dbo.myview 
WITH SCHEMABINDING
AS
  SELECT c.event_id,
         a.barcode,
         [type] = CASE WHEN d.[type] IN (2,8) THEN 'c' ELSE 's' END,
         scan_date = DATEADD(DAY, DATEDIFF(DAY, 0, a.scan_time), 0),
         total = COUNT_BIG(*)
    FROM dbo.scan_11pc_gate_entries AS a
    INNER JOIN dbo.can_units AS b
            ON a.scanner = b.id
    INNER JOIN dbo.scan_gates AS c
            ON b.gate = c.id
    INNER JOIN dbo.scan_11pc_gate_allbarcodes AS d
            ON a.barcode = d.barcode
    WHERE (a.valid IN (1,8))
      AND d.[type] IN (2,3,4,5,8,9)
    GROUP BY 
      c.event_id,
      a.barcode,
      CASE WHEN d.[type] IN (2,8) THEN 'c' ELSE 's' END,
      DATEADD(DAY, DATEDIFF(DAY, 0, a.scan_time), 0);
GO
CREATE UNIQUE CLUSTERED INDEX x 
  ON dbo.myview(event_id, barcode, [type], scan_date);

Now you can write a query that does something like this:

SELECT [date] = CONVERT(CHAR(8), scan_date, 112), 
  SUM(CASE WHEN [type] = 's' THEN total ELSE 0 END) AS total_scans,
  SUM(CASE WHEN [type] = 'c' THEN total ELSE 0 END) AS total_creds,
  COUNT(CASE WHEN [type] = 's' THEN 1 END) AS unique_scans,
  COUNT(CASE WHEN [type] = 'c' THEN 1 END) AS unique_creds
FROM dbo.myview WITH (NOEXPAND) -- in case STD Edition
WHERE event_id = 21
  AND scan_date BETWEEN '20110808' AND '20110814'
GROUP BY scan_date
UNION ALL
SELECT [date] = 'weekly', 
  SUM(CASE WHEN [type] = 's' THEN total ELSE 0 END) AS total_scans,
  SUM(CASE WHEN [type] = 'c' THEN total ELSE 0 END) AS total_creds,
  COUNT(DISTINCT CASE WHEN [type] = 's' THEN barcode END) AS unique_scans,
  COUNT(DISTINCT CASE WHEN [type] = 'c' THEN barcode END) AS unique_creds
FROM dbo.myview WITH (NOEXPAND) -- in case STD Edition
WHERE event_id = 21
  AND scan_date BETWEEN '20110808' AND '20110814'
ORDER BY [date];

This is all completely untested as your schema is a little bit cumbersome to try and create a complete repro of your system, but hopefully this gives you a general idea to work from...

edited Nov 11 '18 at 14:11

answered Mar 29 '13 at 19:43

Aaron Bertrand

272,866
37
466
490

Thanks for helping with the date range, I read your article just before writing this, but I guess I didn't quite get it. I did get rid of the old way of putting a string with milliseconds etc. – pixelwiz Mar 29 '13 at 20:10
But I think I'm missing something, now it says "Msg 243, Level 16, State 1, Line 1 Type DATE is not a defined system type." The solution seems really clever though! – pixelwiz Mar 29 '13 at 20:10
Can you tell us what this returns: `SELECT SERVERPROPERTY('ProductVersion'), compatibility_level FROM sys.databases WHERE database_id = DB_ID();` – Aaron Bertrand Mar 29 '13 at 20:14
Oh, I think I see half the problem, I don't think the SQL in our Dev environment has been upgraded past 2005 yet. – pixelwiz Mar 29 '13 at 20:14
Your dev environment is 3 major versions behind production? That's not good. – Aaron Bertrand Mar 29 '13 at 20:15
I just emailed the networking guys about that, maybe I'm just pointing to the wrong one, haven't worked on this site in years. So I got it to work with d = CONVERT(VARCHAR(8), a.scan_time, 112), but do I need to add WITH ROLLUP or something to it to actually get the totals? Also, I only really need the uniques per day not all of them. Can you give me a hint what other things you'd optimize, sounds like you really know this stuff well. – pixelwiz Mar 29 '13 at 20:25
Did you see that I updated my answer? There is a better approach than converting to a string. You can add `WITH ROLLUP` to the outer query but I have no idea what your exact output requirements are. As for the rest it's way too much for a comment thread on a Friday afternoon, sorry. – Aaron Bertrand Mar 29 '13 at 20:26
Ok, if I run the this query group by day, and then sum together all the day to get the unique scans, that comes out wrong, because we want the unique to be across all days. So would I have to run the query twice to get that part too, or is there a way to do it all in one? – pixelwiz Apr 01 '13 at 13:53
@user912627 can you show sample data and desired results? Word problems get tedious very quickly. – Aaron Bertrand Apr 01 '13 at 13:55
Well, sample result yes, here is a screen grab of what the old queries produce: http://images.pgalinks.com/sampleresults.jpg Sample data is much harder, since it's like 200k rows in multiple tables. The inner query result looks something like this: id gete_id, barcode, type, location, d_date 68 WCTH02743 4 Main Gate 2011-08-11 00:00:00.000 73 GRSA17936 4 Caddie Gate 2011-08-13 00:00:00.000 68 GRSA14978 4 Main Gate 2011-08-13 00:00:00.000 68 GRSU17249 4 Main Gate 2011-08-14 00:00:00.000 – pixelwiz Apr 01 '13 at 14:13
The inner query result looks something like this: http://images.pgalinks.com/innerqueryresults.jpg – pixelwiz Apr 01 '13 at 14:19
Is there a better general approach to doing counts on columns that require distinct? This query is averaging 6 seconds on about 150k rows of data. I'd like to get it under 1 second. But even with indexes added it didn't improve much or doing the inner query as an indexed view. – pixelwiz Apr 03 '13 at 13:31
Indexed views are not magic speed boosters. The inner query as an indexed view is going to expose the same number of rows, so when you query against it, you're still going to have to compute all of the aggregations and distincts at query time. I will take a look at a potential indexed view for you later today. – Aaron Bertrand Apr 03 '13 at 13:34
Aaron, I have one more question. So turns out I have to write a very similar report to the one above, but this one would be groped by gate_id, with just 2 counts, unique tickets and unique credentials for each gate. I know how to do it as a separate query, but the inner part of it is exactly the same. Is there any way to do it on one query? Somehow return 2 data sets? – pixelwiz Apr 03 '13 at 17:53
@pixelwiz if they're the same shape, sure, you could use the inner query as a CTE and then perform a union all (adding a column to indicate which set it came from). – Aaron Bertrand Apr 03 '13 at 18:17
@pixelwiz I've updated with a potential indexed view that you could use to pre-aggregate *some* of this information, then a union all query to get you the rest of the way. As I suggested in the answer, this is not necessarily cut & paste code, as it is completely untested (obviously) with your schema. – Aaron Bertrand Apr 04 '13 at 04:05
Thank you Aaron, that is so clever to do a total=count(*) and to have a case to determine the ticket type and then do sum and count on total. WOW. It at least looks much better than the way I'm doing it. Now I just got to implement, test and compare performance ;) – pixelwiz Apr 04 '13 at 12:28
Wow, huge improvement. My original query was taking about 6 seconds, now it's down to 1.5! THANK YOU! – pixelwiz Apr 04 '13 at 18:07
@pixelwiz just remember that indexed views aren't free; you'll want to test your whole workload as the maintenance of that index will potentially have an effect on the write portions of your workload. I've also added the `WITH (NOEXPAND)` hint in case you are not on Enterprise Edition - if you're on standard you should try the queries again with that hint in place. – Aaron Bertrand Apr 04 '13 at 18:09
Thank you again, I didn't actually use it as a view, just as an Inner Query for now. But I did run into one issue, the Total Uniques for all the days if I do it the last way is fast, but it's not a total of uniques for the week, it's a sum of totals of uniques for each day. Which is actually ok for tickets but not for credentials. So I had to combine the two, and it made it 4 seconds. So still better than 6 but not as good as 1, which is what it was without having to do a count on distinct barcodes. – pixelwiz Apr 08 '13 at 14:23
@pixelwiz this is exactly why sample data and desired results are much for helpful for everyone involved than word problems and a query. – Aaron Bertrand Apr 08 '13 at 15:14

Counts on Date Range and Individual Days

1 Answers1