14

I create a table in HIVE. It has the following columns:

id bigint, rank bigint, date string

I want to get avg(rank) per month. I can use this command. It works.

select a.lens_id, avg(a.rank)
from tableA a
group by a.lens_id, year(a.date_saved), month(a.date_saved); 

However, I also want to get date information. I use this command:

select a.lens_id, avg(a.rank), a.date_saved
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);

It complains: Expression Not In Group By Key

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
chnet
  • 1,993
  • 9
  • 36
  • 51

3 Answers3

16

The full error message should be in the format Expression Not In Group By Key [value].
The [value] will tell you what expression needs to be in the Group By.

Just looking at the two queries, I'd say that you need to add a.date_saved explicitly to the Group By.

QuinnG
  • 6,346
  • 2
  • 39
  • 47
  • 3
    Yes. After adding a.date_saved, it works. However, it does not do what I want. I want avg(rank) per month. Now it does not do average. It just shows all records since adding group by a.date_saved. – chnet Apr 21 '11 at 17:38
  • 3
    @chnet: You can't have a column selected and not have it grouped by that column. If you want to display `a.date_saved` you need to group by it. You might be able to display `year(a.date_saved)` and `month(a.date_saved)` since those are in the `Group by` but not 100% on that. – QuinnG Apr 21 '11 at 17:44
  • Thank you. I am able to display year(a.date_saved) and month(a.date_saved). – chnet Apr 21 '11 at 17:52
  • 4
    what do you mean ? is it something unique for HQL? i am doing group by with not all columns in MySQL and it work'd fine – Omer Anisfeld Feb 10 '20 at 07:20
  • No help. fix by collect_set()[0] – Yin Feb 13 '20 at 16:42
14

A walk around is to put the additional field in a collect_set and return the first element of the set. For example

select a.lens_id, avg(a.rank), collect_set(a.date_saved)[0]
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
vanship82
  • 289
  • 3
  • 3
0

This is because there is more than one ‘date_saved’ record under your group by. You can turn these ‘date_saved’ records into arrays and output them.

Y.Wang
  • 11
  • 3
  • (This post does not seem to provide a [quality answer](https://stackoverflow.com/help/how-to-answer) to the question. Please either edit your answer and maybe show an example, or just post it as a comment to the question.) – sɐunıɔןɐqɐp Sep 21 '18 at 08:17