Selecting max date of each month

Question

I have a table with a lot of cumulative columns, these columns reset to 0 at the end of each month. If I sum this data, I'll end up double counting. Instead, With Hive, I'm trying to select the max date of each month.

I've tried this:

SELECT
    yyyy_mm_dd,
    id,
    name,
    cumulative_metric1,
    cumulative_metric2
FROM
    mytable

WHERE
    yyyy_mm_dd = last_day(yyyy_mm_dd)

mytable has daily data from the start of the year. In the output of the above, I only see the last date for January but not February. How can I select the last day of each month?

Gordon Linoff · Accepted Answer · 2020-02-10T21:38:01.010

1

February is not over yet. Perhaps a window function does what you want:

SELECT yyyy_mm_dd, id, name, cumulative_metric1, cumulative_metric2
FROM (SELECT t.*,
             MAX(yyyy_mm_dd) OVER (PARTITION BY last_day(yyyy_mm_dd)) as last_yyyy_mm_dd
      FROM mytable t
     ) t
WHERE yyyy_mm_dd = last_yyyy_mm_dd;

This calculates the last day in the data.

edited Feb 10 '20 at 21:38

answered Feb 10 '20 at 13:20

Gordon Linoff

1,242,037
58
646
786

Should the inner table be aliased to `t` also? Also, this is a partitioned table so the inner table would also need `yyyy_mm_dd` in a `where` condition as that's the partitioned column. – stackq Feb 10 '20 at 13:28

score 0 · Answer 2 · answered Feb 10 '20 at 13:20

use correlated subquery and date to month function in hive

SELECT
    yyyy_mm_dd,
    id,
    name,
    cumulative_metric1,
    cumulative_metric2
FROM
    mytable t1

WHERE
    yyyy_mm_dd = select max(yyyy_mm_dd) from mytable t2 where
     month(t1.yyyy_mm_dd)= month(t2.yyyy_mm_dd)

Selecting max date of each month

2 Answers2