2

I have a query where I want to calculate the percentage of sales of a certain product within its category. Thus I calculate the amount sales per product and use an analytic function and partition by category.

SELECT product_id, SUM(sales)/(SUM(sales) OVER(PARTITION BY category))
FROM table1 
GROUP BY product_id

I get this error message:

AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): sum(sales) / ((sum(sales) OVER (PARTITION BY category)))

Anyone that know how to solve this and why it happens?

Energizer1
  • 285
  • 1
  • 5
  • 15
  • 1
    Is ````product_id```` an unique value?, because if so, ````group by product_id```` doesn't make any sense to me. – Chema Jul 10 '20 at 10:06
  • 1
    Could you try ````SELECT product_id, sales / SUM(sales) OVER(PARTITION BY category) FROM table1;````? – Chema Jul 10 '20 at 11:05
  • Correct, product_id is unique. The query works if I write: SELECT product_id, SUM(sales) OVER(PARTITION BY category) FROM table1; If I just write: SELECT product_id, sales / SUM(sales) OVER(PARTITION BY category) FROM table1; Won't it just take the first available sales value for each product_id? (each row in table1 is a product and a date). I think I need to sum the sales over a defined period of time per product_id and divide it by the sum of sales for that category in order to get the right percentage value. – Energizer1 Jul 10 '20 at 11:26
  • 1
    I don't think so, but you could try with a few values, I mean with a small table with ten records for example and calculate by hand the expected result and see what the query produces. It was what I did. Please let me know if you try it. – Chema Jul 10 '20 at 11:38

3 Answers3

0

You need to nest the aggregation function in the window function:

SELECT product_id, category,
       (SUM(sales) /
        SUM( SUM(sales) ) OVER (PARTITION BY category))
-------------^ aggregation function
--------^ window function
       )
FROM table1 
GROUP BY product_id, category;

Combining aggregation and analytic functions looks strange at first (at least it did to me). Just remember that the aggregation functions are evaluated first, so the inner SUM(sales) is the result of the aggregation. The other SUM() is the window function.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I get AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): (sum(sales) / sum(sum(sales)) OVER (PARTITION BY category)) SELECT product_id, ( SUM(sales) / SUM( SUM(sales) ) OVER (PARTITION BY category)) ) FROM table1 GROUP BY product_id. without group by i get: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): product_id group by 1,2 i get: AnalysisException: GROUP BY expression must not contain aggregate functions: 2 – Energizer1 Jul 11 '20 at 07:43
  • @Energizer1 . . . You just ned `category` in the `group by`. – Gordon Linoff Jul 11 '20 at 10:26
0

I think, I have a good example that fits with your question:

Table example

+----------+---------+---------+--------+
| order_id | cust_id | empl_id | total  |
+----------+---------+---------+--------+
| 1        | c       | 1       | 24.78  |
| 2        | a       | 4       | 28.54  |
| 3        | b       | 3       | 48.69  |
| 4        | b       | 3       | -16.39 |
| 5        | z       | 2       | 29.92  |
| 6        | z       | 3       | 12.50  |
| 7        | a       | 4       | 10.20  |
| 8        | c       | 1       | 5.22   |
+----------+---------+---------+--------+

query

SELECT order_id, total / (SUM(total) OVER(PARTITION BY cust_id)) AS percentage
FROM orders
ORDER BY order_id;

result

+----------+------------+
| order_id | percentage |
+----------+------------+
| 1        | 0.82       |
| 2        | 0.73       |
| 3        | 1.50       |
| 4        | -0.50      |
| 5        | 0.70       |
| 6        | 0.29       |
| 7        | 0.26       |
| 8        | 0.17       |
+----------+------------+

as you can see, for example, for cust_id --> c there are two values 24,78 and 5,22 that sum = 30.00 and percentage is 24,78 --> 0.82 and 5,22 --> 0.17

Please review if it is what you want.

Chema
  • 2,748
  • 2
  • 13
  • 24
0

Impala does not seem to support analytic functions this way, I solved with sub-queries:

SELECT DISTINCT t2.product_id, (t2.sales/t2.tot_sales_cat) AS perc_cat_spend 
FROM( 
        SELECT t1.product_id, SUM(t1.sales) OVER (PARTITION BY t1.category) AS tot_sales_cat, t4.prod_sales 
        FROM table1 as t1 
        JOIN(   SELECT  t1.product_id, SUM(t1.sales) AS prod_sales 
                FROM table1 AS t1 
                GROUP BY 1
            )   AS t4 ON t4.product_id=t1.product_id;
Energizer1
  • 285
  • 1
  • 5
  • 15