1

I am trying to calculalate maximum of column values with common id.

I have following table as input

TABLE 1:

| id    | seq    | score |
| ----- | ------ | ----- |
| UA502 | qrst   | 8.2   |
| UA502 | abcdef | 2.2   |
| UA504 | yzab   | 8.8   |
| UA504 | lmnop  | 2.8   |
| UA503 | uvwx   | 8.6   |
| UA503 | ghijk  | 2.6   |

The desired output is:

| id    | seq    | score |
| ----- | ------ | ----- |
| UA502 | qrst   | 8.2   |
| UA504 | yzab   | 8.8   |
| UA503 | uvwx   | 8.6   |

I am running following WITH query (max_calc) with groupby and max function on the output of another WITH query (union_data; TABLE 1).

max_calc as(

  select id, seq, max(score)
  from union_data
  GROUP BY id

  )

select * from max_calc
; 

The error I am getting is:

 Query Error: error: column "union_data.seq" must appear in the GROUP BY clause or be used in an aggregate function 

I do not understand this error. I am grouping the data based on common id not seq. Why I should include column "union_data.seq" in GROUPBY.

Thanks

GMB
  • 216,147
  • 25
  • 84
  • 135
rshar
  • 1,381
  • 10
  • 28

3 Answers3

3

In Postgres, you can use handy extension distinct on for this:

select distinct on (id) u.*
from union_data u
order by id, score desc
GMB
  • 216,147
  • 25
  • 84
  • 135
  • How was the max calculated without MAX function. Sorry for a basic question!! – rshar Feb 18 '20 at 15:30
  • @rshar: that's exactly what `distinct on` is here for. The trick is in the `order by` clause (ordering by descending `score` per `id`). You can have a look at the link to the documentation that I put in my answer. – GMB Feb 18 '20 at 15:32
0

GMB's answer is the best answer for three columns (and duly upvoted). If you want more aggregations, though, you can emulate a "first" aggregation function using arrays:

  select id,
         (array_agg(seq order by score desc))[1] as seq,
         max(score)
  from union_data
  group by id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The answers previously given show how to correct/circumvent the original error. However they do not address the actual inquiry as to the cause of the error. So lets return to the original query.

  select id, seq, max(score)
  from union_data
  GROUP BY id 

This query results in the error. The cause being that the non-aggregate column seq is omitted from the grouping. SQL syntax rules require all non-aggregate columns in the select list be in the "group by" clause when aggregate function(s) are also in the column list. That is why Postgres has the "distinct by" extension. It essentially permits bypassing the SQL syntax rule, but it's not a free lunch. Distinct By imposes its own requirements.

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. ... The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.]

Belayer
  • 13,578
  • 2
  • 11
  • 22