SQL Server - Count Distinct outside of group

Question

Suppose I have the following table in my SQL Server (2012) DB:

Tbl1:

Id1:     Id2:      Date:            Value:
1        A         '2018-01-01'     1
1        B         '2018-01-01'     1
1        C         '2018-01-01'     1
1        A         '2018-01-02'     1
1        B         '2018-01-02'     1
1        A         '2018-01-31'     1

And I want to get the average of Value per Id2 where Id1 = 1 for Date BETWEEN '2018-01-01' AND '2018-01-02'.

HOWEVER, for the average, I want to divide by the total number of Dates available, NOT JUST FOR THAT ONE Id2.

So, my resultset would be:

Id2:       AvgVal:
A          1+1 / 2 = 1
B          1+1 / 2 = 1
C          1 / 2 = 0.5

Note that C's value is divided by 2 even though there is only 1 value for C in that date range (in other words, we're dividing by COUNT(DISTINCT Date) OVER (PARTITION BY Id1) only. Not the extra grouping by Id2.

I've come up with a way to do it using CTEs and can think of a way to do it by storing the divisor in a temporary variable, but I'm REALLY trying to think through if there's a way to accomplish this in a single SELECT statement.

My current solution looks as follows:

DECLARE @StartDate DATE = '2018-01-01', @EndDate DATE = '2018-01-02', @Id1 INT = 1;

DECLARE @Tbl1 TABLE ([Id1] INT, [Id2] VARCHAR(1), [Date] DATE, [Value] FLOAT);

INSERT INTO @Tbl1 (Id1, Id2, Date, Value)
VALUES
    (1, 'A','2018-01-01', 1),
    (1, 'B','2018-01-01', 1),
    (1, 'C','2018-01-01', 1),
    (1, 'A','2018-01-02', 1),
    (1, 'B','2018-01-02', 1),
    (1, 'A','2018-01-31', 1)
;

WITH CTE AS
(
    SELECT *
    FROM @Tbl1
    WHERE [Date] BETWEEN @StartDate AND @EndDate
    AND Id1 = @Id1
),

CTE1 AS
(
    SELECT COUNT(DISTINCT [Date]) AS MaxNumDates FROM CTE
)

SELECT
     Id2
    ,SUM(Value) / MaxNumDates AS AvgVal
FROM
    CTE
    JOIN CTE1
    ON 1 = 1
GROUP BY
    Id2,
    MaxNumDates

And, like I said, I figure I could also do it by assigning a local variable to the value of MaxNumDates, but I'm really trying to see if there's a way to do this via a single query with some form of partitioning / aggregation - Assuming, of course, it is a better way to do it.

Any thoughts?

Thanks!

Did the answer in your previous question now do what you needed? The questions look similar enough. https://stackoverflow.com/a/52044975/1073631 — sgeddes, Aug 28 '18 at 17:39
@sgeddes, I'm really leaning towards that, but I'm trying to avoid even the sub-query if it is possible since my real table has multiple group-by criteria not just `Id1`.... But that works a darn charm! — John Bustos, Aug 28 '18 at 17:40
Basically, @sgeddes, if you look at my question, I absolutely implemented the solution from my previous question, I just changed the sub-query to a CTE since I have to use it multiple times for different calculations in my real situation. That's why I'm wondering if there's a better way overall. — John Bustos, Aug 28 '18 at 17:47

score 0 · Answer 1 · answered Aug 28 '18 at 19:50

Since your count(distinct Date) will be the same for whole result set, why NOT just pick it up in a separate variable and use it,

DECLARE @StartDate DATE = '2018-01-01', @EndDate DATE = '2018-01-02', @Id1 INT = 1;

DECLARE @Tbl1 TABLE ([Id1] INT, [Id2] VARCHAR(1), [Date] DATE, [Value] FLOAT);

INSERT INTO @Tbl1 (Id1, Id2, Date, Value)
VALUES
    (1, 'A','2018-01-01', 1),
    (1, 'B','2018-01-01', 1),
    (1, 'C','2018-01-01', 1),
    (1, 'A','2018-01-02', 1),
    (1, 'B','2018-01-02', 1),
    (1, 'A','2018-01-31', 1)
;

-- a variable to hold the count of distinct dates
declare @distinctDates int;

-- here we are reading the value in variable
select  @distinctDates = count(distinct t.[Date])
from    @Tbl1 as t
where   t.[Date] between @StartDate and @EndDate
        and t.Id1 = @id1

select  t.Id2, sum(t.[Value]) as sumOfValue, @distinctDates as distinctDates
        ,  sum(t.[Value]) / @distinctDates as AvgVal 
from    @Tbl1 as t
where   t.[Date] between @StartDate and @EndDate
        and t.Id1 = @id1
group by t.Id2

Thanks, sallushan. As I wrote in the question, I had also thought of doing that, but would really prefer not to have to do that and wanted to see if I could do it all in one query using some form of aggregation / partitioning. The reason why is that this is a very over-simplified example and the `where` clause is actually pretty ugly and may change, so I really don't like having to put it in 2 places in my SQL. But, overall, I'd REALLY like to know if there's a smart, efficient way that can be done as a single statement. — John Bustos, Aug 29 '18 at 14:28
Aaah I didn't read your question fully. Normally I prefer such type of calculations on application end (e.g. C#), provided if result set is NOT too big, another option to go with dynamic SQL (which I don't really like) in-case if you don't want your where clause to be updated on 2 places. If your `where` clause is pretty ugly then I think using a separate variable is a good bet. — sallushan, Aug 31 '18 at 10:30

SQL Server - Count Distinct outside of group

1 Answers1