1

I have data in Hive that has two columns of interest. The first is a column (int) that represents a date (YYYYMM) and the second is a column (int) that represents a number of people for that date.

date    people
202107  10
202001  2
202002  4
202003  6
201901  3
201902  5
201903  7
201809  11

I'm trying to write a query that sums up the number of people for comparison across two quarters. For example, I want the total of people that are between 202001 and 202003 vs. the total number of people between 201901 and 201903.

The result I would want is:

"2020 Q1" 12
"2019 Q1" 15

I've tried a couple of solutions like using a SELECT CASE however when I'm trying to do a group by you can't use an alias like MySQL in Hive. I'm a little stuck on this and would appreciate some help.

broepke
  • 71
  • 2
  • 6

2 Answers2

1

You can use below SQL. It calculates quarter on the fly and then group by it.

Select 
CONCAT(SUBSTR(DATE,1,4),' Q', CAST(
quarter(from_unixtime(unix_timestamp(`DATE`, 'yyyyMM'))) AS STRING)), SUM(PEOPLE)
GROUP BY 
CONCAT(SUBSTR(DATE,1,4),' Q', CAST(
quarter(from_unixtime(unix_timestamp(`DATE`, 'yyyyMM'))) AS STRING))
ORDER BY 1
Koushik Roy
  • 6,868
  • 2
  • 12
  • 33
0
SELECT date,sum(people) FROM a WHERE date<= 202003 and date >= 202001
UNION 
SELECT date,sum(people) FROM a WHERE date<= 201903 and date >= 201901
Nicholas Gentile
  • 1,512
  • 1
  • 9
  • 16
Wang
  • 1