1

Trying to execute this in Athena SQL. I have '75' as default tax_type which i need to replace with '76' for 20% of tax_amt

Current output as is

geocode  tax_type  tax_amt
32         75       10
32         75       15
32         75       20
32         75       18
32         75       20
32         75       17

I need output as

geocode  tax_type  tax_amt
32         75       10
32         75       15
32         75       20
32         75       18
32         76       20
32         75       17

I keep getting below error:

"GROUP BY clause cannot contain aggregations or window functions: ["count"("tax_amt"), "count"(), "count"("tax_amt"), "count"()]"

If i remove from group by clause then it says

"must be an aggregate expression or appear in GROUP BY clause"

Any help would be really appreciated

SQL

   WITH a AS
   (SELECT * 
    FROM "dl_wireless_boost"."ngpp_tax_fact" 
    WHERE dt = (SELECT MAX(dt) 
                FROM "dl_wireless_boost"."ngpp_tax_fact")
   )
, b AS 
    (SELECT g.geocode_id
         , g.geocode
         , coalesce(sq_actual_names.actual_name, g.jurisdiction_name) AS JUR_NAME
    FROM "dl_wireless_boost"."ngpp_geocode_dim" g 
    LEFT JOIN 
       (SELECT REPLACE(geocode,'-','') AS stripped_geocode
               , MAX(jurisdiction_name) AS actual_name
        FROM "dl_wireless_boost"."ngpp_geocode_dim" 
        WHERE 1=1 AND dt = (SELECT MAX(dt) 
                            FROM "dl_wireless_boost"."ngpp_geocode_dim")
        AND (geocode = geocode OR geocode = SUBSTRING(geocode,1,2) || '-' || 
                                            SUBSTRING(geocode,3,3) || '-' || 
                                            SUBSTRING(geocode,6,4))
        AND jurisdiction_name <> 'U'
        GROUP BY 1) sq_actual_names
     ON g.geocode = sq_actual_names.stripped_geocode
     WHERE 1=1 
       AND g.dt = (SELECT MAX(dt) FROM "dl_wireless_boost"."ngpp_geocode_dim")
       AND (geocode = geocode OR geocode = SUBSTRING(geocode,1,2) || '-' || 
                                           SUBSTRING(geocode,3,3) || '-' || 
                                           SUBSTRING(geocode,6,4))
   )
, d AS 
    (SELECT * 
     FROM "dl_wireless_boost"."ngpp_tax_code_dim" 
     WHERE dt = (SELECT MAX(dt) 
                 FROM "dl_wireless_boost"."ngpp_tax_code_dim")
    )   
, agg AS
    (SELECT 
          CASE 
               WHEN d.tax_auth = '1' THEN SUBSTRING(b.geocode,1,2) || '0000000' 
               WHEN d.tax_auth = '2' THEN SUBSTRING(b.geocode,1,5) || '0000' 
               ELSE b.geocode 
          END AS GEOCODE
         , b.JUR_NAME AS JURIS_NAME
         , CAST(COUNT(a.tax_amt)/COUNT(*)*100 AS DECIMAL(30,2)) AS PERCENTILE
         , d.tax_type 
         , CAST (d.tax_rate AS DECIMAL (30,5)) AS TAX_RATE
         , SUM(CAST(a.tax_amt AS DECIMAL(30,2))) AS TAX
         , SUBSTRING(b.geocode,1,2) AS STATE_GEO
         , 'PPD Boost Usage' AS ID
         , d.tax_auth AS TAX_AUTH
         , f.auth_name AS AUTH_NAME
         , CASE 
              WHEN SUBSTRING(b.geocode,1,2) = '72'
              THEN '417' 
              ELSE '317' 
          END AS BU
        , c.tax_descrp AS TAX_DESCRP
        , CASE 
              WHEN b.geocode LIKE '%-%-%' 
              THEN b.geocode 
              ELSE SUBSTRING(geocode,1,2) || '-' || 
                   SUBSTRING(geocode,3,3) || '-' || 
                   SUBSTRING(geocode,6,4) 
          END AS GEOMASK 
        , 'NO' AS COST_RECOVERY_IDENTIFIER
        , CONCAT(SUBSTRING(b.geocode,1,2),d.tax_auth,d.tax_type) AS GL_CROSS_REF
        , SUBSTRING(load_id,1,6) AS YYYYMM
        , c.gl_account AS GL_ACCOUNT
        , d.tax_srv_type

    FROM a
    LEFT JOIN b
      ON a.geocode_id = b.geocode_id

    LEFT JOIN  d    
      ON a.tax_code_id = d.tax_code_id

    LEFT JOIN "sbx_accntnganalytics"."tax_auth_name" f
       ON d.tax_auth = f.tax_auth

    LEFT JOIN "sbx_accntnganalytics"."tax_gl_desc" c
       ON d.tax_type = c.tax_type 
       AND SUBSTRING(b.geocode,1,2) = c.state_geo

    GROUP BY 1,2,4,5,7,8,9,10,11,12,13,14,15,16,17,18
)
SELECT GEOCODE
       , JURIS_NAME
       , PERCENTILE
       , CASE 
                when STATE_GEO = '32' AND tax_type = '75' AND CAST(COUNT(tax)/COUNT(*)*100 AS DECIMAL(30,2)) = 0.20 
                THEN REPLACE (tax_type,'75','76') else tax_type end as tax_type
       , TAX_RATE
       , TAX
       , STATE_GEO
       , TAX_AUTH
       , AUTH_NAME
       , BU
       , TAX_DESCRP
       , GEOMASK 
       , COST_RECOVERY_IDENTIFIER
       , GL_CROSS_REF
       , YYYYMM
       , GL_ACCOUNT
       , tax_srv_type
FROM agg
 
Harry
  • 11
  • 4
  • Getting below error @Parfait:SYNTAX_ERROR: line 5:1: GROUP BY clause cannot contain aggregations or window functions: ["count"("a"."tax_amt"), "count"(*), "count"("a"."tax_amt"), "count"(*)] – Harry Sep 23 '20 at 21:20
  • @Parfait: i have edited my SQL in the main text box.Please help thanks – Harry Sep 24 '20 at 14:16

2 Answers2

0

I suspect that you want:

select geocode, tax_type, 
    case when ntile(5) over(order by id) = 1 then 76 else tax_amt end tax_amt
from mytable

This assumes that column id can be used to sort the records. The query identifies the top 20% records with window function ntil(), and assigns value 76 to column tax_amt.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • @Harry: no. `ntile()` distributes the rows in 5 groups of even size, no matter the number of rows. – GMB Sep 22 '20 at 23:07
  • @Harry: obviously this is very different than the query I gave you. An obviously as well, comments are not meant to provide such large code blocks. I can't suggest here. – GMB Sep 24 '20 at 09:58
  • i have edited my SQL in the main text box.Please help thanks – Harry Sep 24 '20 at 14:15
0

Because your are combining a grouping column with aggregate in same expression, you run into the GROUP BY error. Consider aggregating first then run your CASE expression which can be facilitated with common table expressions (CTE) using WITH clause. You can even flatten your CASE with multiple WHEN logical expression.

WITH a AS
   (SELECT * 
    FROM "dl_wireless_boost"."ngpp_tax_fact" 
    WHERE dt = (SELECT MAX(dt) 
                FROM "dl_wireless_boost"."ngpp_tax_fact")
   )
, b AS 
    (SELECT g.geocode_id
         , g.geocode
         , coalesce(sq_actual_names.actual_name, g.jurisdiction_name) AS JUR_NAME
    FROM "dl_wireless_boost"."ngpp_geocode_dim" g `enter code here`
    LEFT JOIN 
       (SELECT REPLACE(geocode,'-','') AS stripped_geocode
               , MAX(jurisdiction_name) AS actual_name
        FROM "dl_wireless_boost"."ngpp_geocode_dim" 
        WHERE 1=1 AND dt = (SELECT MAX(dt) 
                            FROM "dl_wireless_boost"."ngpp_geocode_dim")
        AND (geocode = geocode OR geocode = SUBSTRING(geocode,1,2) || '-' || 
                                            SUBSTRING(geocode,3,3) || '-' || 
                                            SUBSTRING(geocode,6,4))
        AND jurisdiction_name <> 'U'
        GROUP BY 1) sq_actual_names
     ON g.geocode = sq_actual_names.stripped_geocode
     WHERE 1=1 
       AND g.dt = (SELECT MAX(dt) FROM "dl_wireless_boost"."ngpp_geocode_dim")
       AND (geocode = geocode OR geocode = SUBSTRING(geocode,1,2) || '-' || 
                                           SUBSTRING(geocode,3,3) || '-' || 
                                           SUBSTRING(geocode,6,4))
   )
, d AS 
    (SELECT * 
     FROM "dl_wireless_boost"."ngpp_tax_code_dim" 
     WHERE dt = (SELECT MAX(dt) 
                 FROM "dl_wireless_boost"."ngpp_tax_code_dim")
    )   
, agg AS
    (SELECT 
          CASE 
               WHEN d.tax_auth = '1' THEN SUBSTRING(b.geocode,1,2) || '0000000' 
               WHEN d.tax_auth = '2' THEN SUBSTRING(b.geocode,1,5) || '0000' 
               ELSE b.geocode 
          END AS GEOCODE
         , b.JUR_NAME AS JURIS_NAME
         , CAST(COUNT(a.tax_amt)/COUNT(*)*100 AS DECIMAL(30,2)) AS PERCENTILE,
         , d.tax_type
         , CAST (d.tax_rate AS DECIMAL (30,5)) AS TAX_RATE
         , SUM(CAST(a.tax_amt AS DECIMAL(30,2))) AS TAX
         , SUBSTRING(b.geocode,1,2) AS STATE_GEO
         , 'PPD Boost Usage' AS ID
         , d.tax_auth AS TAX_AUTH
         , f.auth_name AS AUTH_NAME
         , CASE 
              WHEN SUBSTRING(b.geocode,1,2) = '72'
              THEN '417' 
              ELSE '317' 
          END AS BU
        , c.tax_descrp AS TAX_DESCRP
        , CASE 
              WHEN b.geocode LIKE '%-%-%' 
              THEN b.geocode 
              ELSE SUBSTRING(geocode,1,2) || '-' || 
                   SUBSTRING(geocode,3,3) || '-' || 
                   SUBSTRING(geocode,6,4) 
          END AS GEOMASK 
        , 'NO' AS COST_RECOVERY_IDENTIFIER
        , CONCAT(SUBSTRING(b.geocode,1,2),d.tax_auth,d.tax_type) AS GL_CROSS_REF
        , SUBSTRING(load_id,1,6) AS YYYYMM
        , c.gl_account AS GL_ACCOUNT
        , d.tax_srv_type

    FROM a
    LEFT JOIN b
      ON a.geocode_id = b.geocode_id

    LEFT JOIN  d    
      ON a.tax_code_id = d.tax_code_id

    LEFT JOIN "sbx_accntnganalytics"."tax_auth_name" f
       ON d.tax_auth = f.tax_auth

    LEFT JOIN "sbx_accntnganalytics"."tax_gl_desc" c
       ON d.tax_type = c.tax_type 
       AND SUBSTRING(b.geocode,1,2) = c.state_geo

    GROUP BY 1,2,4,5,7,8,9,10,11,12,13,14,15,16,17,18
)

SELECT GEOCODE
       , JURIS_NAME
       , PERCENTILE
       , CASE 
                WHEN STATE_GEO = '32' AND d.tax_type = '75' AND PERCENTILE = 0.20 
                THEN REPLACE (d.tax_type,'75','76') 
                WHEN (STATE_GEO <> '32' OR d.tax_type <> '75') AND PERCENTILE = 0.80 
                THEN d.tax_type 
                ELSE NULL
         END AS TAX_TYPE
       , TAX_RATE
       , TAX
       , STATE_GEO
       , ID
       , TAX_AUTH
       , AUTH_NAME
       , BU
       , TAX_DESCRP
       , GEOMASK 
       , COST_RECOVERY_IDENTIFIER
       , GL_CROSS_REF
       , YYYYMM
       , GL_ACCOUNT
       , tax_srv_type
FROM agg

Aside - be sure to heed Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3) with more informative aliases.

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Hello @Parfait, i tried using your solution but am still getting the same issue. Please see modified code in main text box: – Harry Sep 28 '20 at 16:43