Grouping by a Range of Numbers with Aggregate Function

Question

I have data in Hive that has two columns of interest. The first is a column (int) that represents a date (YYYYMM) and the second is a column (int) that represents a number of people for that date.

date    people
202107  10
202001  2
202002  4
202003  6
201901  3
201902  5
201903  7
201809  11

I'm trying to write a query that sums up the number of people for comparison across two quarters. For example, I want the total of people that are between 202001 and 202003 vs. the total number of people between 201901 and 201903.

The result I would want is:

"2020 Q1" 12
"2019 Q1" 15

I've tried a couple of solutions like using a SELECT CASE however when I'm trying to do a group by you can't use an alias like MySQL in Hive. I'm a little stuck on this and would appreciate some help.

Koushik Roy · Accepted Answer · 2021-04-16T05:28:44.353

1

You can use below SQL. It calculates quarter on the fly and then group by it.

Select 
CONCAT(SUBSTR(DATE,1,4),' Q', CAST(
quarter(from_unixtime(unix_timestamp(`DATE`, 'yyyyMM'))) AS STRING)), SUM(PEOPLE)
GROUP BY 
CONCAT(SUBSTR(DATE,1,4),' Q', CAST(
quarter(from_unixtime(unix_timestamp(`DATE`, 'yyyyMM'))) AS STRING))
ORDER BY 1

edited Apr 16 '21 at 05:28

answered Apr 16 '21 at 04:23

Koushik Roy

6,868
2
12
33

score 0 · Answer 2 · edited Apr 18 '21 at 07:33

0

SELECT date,sum(people) FROM a WHERE date<= 202003 and date >= 202001
UNION 
SELECT date,sum(people) FROM a WHERE date<= 201903 and date >= 201901

edited Apr 18 '21 at 07:33

Nicholas Gentile

1,512
1
9
16

answered Apr 16 '21 at 03:45

Wang

1

Grouping by a Range of Numbers with Aggregate Function

2 Answers2