I have a table TAB with 5 columns,
col1: string ('AAA', 'BBB', ...)
col2: string (null, 'XYZ', ...)
col3: timestamp ('2020-01-01 00:00:00', ...)
col4: string ('A1A', 'B1B', ...)
col5: string ('A2A', null, ...)
Sample Data
WITH TAB AS
(SELECT 'AAA' col1, null col2, TIMESTAMP '2020-01-01 00:00:00' col3, 'A1A' col4, 'A2A' col5
UNION ALL
SELECT 'AAA', null, TIMESTAMP '2020-01-01 00:00:00', null, 'A2A'
UNION ALL
SELECT 'AAA', 'XYZ', TIMESTAMP '2020-01-01 00:00:00', null, 'A2A'
UNION ALL
SELECT 'BBB', 'XYZ', TIMESTAMP '2020-01-02 00:00:00', 'B1B', null )
My query which didn't give intended result,
SELECT
col1,
col2,
DATE(col3) as date,
COUNTIF(NULLIF(TRIM(col4),"") IS NULL) AS col4_isnull_total,
COUNTIF(NULLIF(TRIM(col5),"") IS NULL) AS col5_isnull_total,
COUNTIF(NULLIF(TRIM(col2),"") IS NULL) OVER (PARTITION BY col1, DATE(col3) AS col2_isnull_total
FROM TAB
WHERE DATE(col3) BETWEEN '2020-01-01' AND '2020-01-31'
GROUP BY col1, col2, DATE(col3)
The result I'm trying to achieve is,
col1 | col2 | date | col4_isnull_total | col5_isnull_total | col2_isnull_total |
---|---|---|---|---|---|
AAA | null | 2020-01-01 | 11 | 15 | 3 |
AAA | XYZ | 2020-01-01 | 30 | 45 | 3 |
AAA | ABC | 2020-01-01 | 10 | 5 | 3 |
AAA | null | 2020-01-02 | 1 | 15 | 1 |
AAA | XYZ | 2020-01-02 | 3 | 45 | 1 |
AAA | ABC | 2020-01-02 | 10 | 5 | 1 |
BBB | null | 2020-01-01 | 7 | 5 | 4 |
BBB | XYZ | 2020-01-01 | 3 | 35 | 4 |
BBB | ABC | 2020-01-01 | 14 | 19 | 4 |
... | ... | ... | ... | ... | ... |
Intended result from the Sample Data,
col1 | col2 | date | col4_isnull_total | col5_isnull_total | col2_isnull_total |
---|---|---|---|---|---|
AAA | null | 2020-01-01 | 1 | 0 | 2 |
AAA | XYZ | 2020-01-01 | 1 | 0 | 1 |
BBB | XYZ | 2020-01-02 | 0 | 1 | 0 |
So, basically I want aggregate by col1, col2 and DATE(col3) and find number of nulls using group by
and also the number of nulls in col2 by col1 and DATE(col3) using partition by
The error I get is,
PARTITION BY expression references column col3 which is neither grouped nor aggregated