-1

I have a table TAB with 5 columns,
col1: string ('AAA', 'BBB', ...)
col2: string (null, 'XYZ', ...)
col3: timestamp ('2020-01-01 00:00:00', ...)
col4: string ('A1A', 'B1B', ...)
col5: string ('A2A', null, ...)

Sample Data

WITH TAB AS 
(SELECT 'AAA' col1, null col2, TIMESTAMP '2020-01-01 00:00:00' col3, 'A1A' col4, 'A2A' col5    
UNION ALL   
SELECT 'AAA', null, TIMESTAMP '2020-01-01 00:00:00', null, 'A2A'  
UNION ALL   
SELECT 'AAA', 'XYZ', TIMESTAMP '2020-01-01 00:00:00', null, 'A2A' 
UNION ALL  
SELECT 'BBB', 'XYZ', TIMESTAMP '2020-01-02 00:00:00', 'B1B', null )

My query which didn't give intended result,

SELECT
  col1,
  col2,
  DATE(col3) as date,
  COUNTIF(NULLIF(TRIM(col4),"") IS NULL) AS col4_isnull_total,
  COUNTIF(NULLIF(TRIM(col5),"") IS NULL) AS col5_isnull_total,
  COUNTIF(NULLIF(TRIM(col2),"") IS NULL) OVER (PARTITION BY col1, DATE(col3) AS col2_isnull_total
FROM TAB
WHERE DATE(col3) BETWEEN '2020-01-01' AND '2020-01-31'
GROUP BY col1, col2, DATE(col3)

The result I'm trying to achieve is,

col1 col2 date col4_isnull_total col5_isnull_total col2_isnull_total
AAA null 2020-01-01 11 15 3
AAA XYZ 2020-01-01 30 45 3
AAA ABC 2020-01-01 10 5 3
AAA null 2020-01-02 1 15 1
AAA XYZ 2020-01-02 3 45 1
AAA ABC 2020-01-02 10 5 1
BBB null 2020-01-01 7 5 4
BBB XYZ 2020-01-01 3 35 4
BBB ABC 2020-01-01 14 19 4
... ... ... ... ... ...

Intended result from the Sample Data,

col1 col2 date col4_isnull_total col5_isnull_total col2_isnull_total
AAA null 2020-01-01 1 0 2
AAA XYZ 2020-01-01 1 0 1
BBB XYZ 2020-01-02 0 1 0

So, basically I want aggregate by col1, col2 and DATE(col3) and find number of nulls using group by and also the number of nulls in col2 by col1 and DATE(col3) using partition by

The error I get is,
PARTITION BY expression references column col3 which is neither grouped nor aggregated

Sam Gladio
  • 89
  • 1
  • 1
  • 4

1 Answers1

0

You might consider below query.

WITH TAB AS (
  SELECT 'AAA' col1, null col2, TIMESTAMP '2020-01-01 00:00:00' col3, 'A1A' col4, 'A2A' col5    UNION ALL
  SELECT 'AAA', null, TIMESTAMP '2020-01-01 00:00:00', null, 'A2A'    UNION ALL   
  SELECT 'AAA', 'XYZ', TIMESTAMP '2020-01-01 00:00:00', null, 'A2A'    UNION ALL
  SELECT 'BBB', 'XYZ', TIMESTAMP '2020-01-02 00:00:00', 'B1B', null
)
SELECT col1,
       col2,
       DATE(col3) AS date,
       COUNTIF(NULLIF(TRIM(col4),"") IS NULL) AS col4_isnull_total,
       COUNTIF(NULLIF(TRIM(col5),"") IS NULL) AS col5_isnull_total,
       ANY_VALUE(col2_isnull_total) AS col2_isnull_total,
  FROM (
    SELECT *,
           COUNTIF(NULLIF(TRIM(col2),"") IS NULL) OVER (PARTITION BY col1, DATE(col3)) AS col2_isnull_total
      FROM TAB
     WHERE DATE(col3) BETWEEN '2020-01-01' AND '2020-01-31'
  )
 GROUP BY 1, 2, 3;

enter image description here

Below is a query without using subquery (not recommended. not readable, subject easily to be broken)

SELECT col1,
       col2,
       DATE(col3) AS date,
       COUNTIF(NULLIF(TRIM(col4),"") IS NULL) AS col4_isnull_total,
       COUNTIF(NULLIF(TRIM(col5),"") IS NULL) AS col5_isnull_total,
       IFNULL(FIRST_VALUE(IF(NULLIF(TRIM(col2),"") IS NULL, COUNTIF(NULLIF(TRIM(col2),"") IS NULL), NULL) IGNORE NULLS) OVER w, 0) AS col2_isnull_total2
  FROM TAB
 WHERE DATE(col3) BETWEEN '2020-01-01' AND '2020-01-31'
 GROUP BY 1, 2, 3
WINDOW w AS (PARTITION BY col1, ANY_VALUE(DATE(col3)) ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
;
Jaytiger
  • 11,626
  • 2
  • 5
  • 15
  • I tried the ANY_VALUE(), it didn't work. Please execute your suggestion for the following, the count for col2 should be 2 for col1 'AAA' but instead it's 1. `WITH TAB AS ( SELECT 'AAA' col1, null col2, TIMESTAMP '2020-01-01 00:00:00' col3, 'A1A' col4, 'A2A' col5 UNION ALL SELECT 'AAA', null, TIMESTAMP '2020-01-01 00:00:00', null, 'A2A' UNION ALL SELECT 'AAA', 'XYZ', TIMESTAMP '2020-01-01 00:00:00', null, 'A2A' UNION ALL SELECT 'BBB', 'XYZ', TIMESTAMP '2020-01-02 00:00:00', 'B1B', null )` – Sam Gladio Dec 05 '22 at 20:18
  • Would you update the question with your sample and expected result as well ? I think you want the window function is evaluated before the aggregation function. – Jaytiger Dec 05 '22 at 20:42
  • are you able to explain why *col2_isnull_total* of second row is `1` , not `2` in your output? It seems little bit different what I've updated. – Jaytiger Dec 06 '22 at 10:12
  • Yeah, it should be 2, thanks. Are there no other ways apart from the nested query solution like having both under the same level of the query. – Sam Gladio Dec 06 '22 at 13:23
  • Since an aggregate function is evaluated before a window function, normally it's not possible. But in your specific case, there might be a workaround that I don't personally recommended. – Jaytiger Dec 06 '22 at 13:56
  • I've added another version which doesn't use a subquery. kindly check that out. – Jaytiger Dec 06 '22 at 14:06