0

I'm working with Impala and Hive, so I don't know if I have fancy functions that Oracle or MSQL provide

So , having a table of values and timestamps

 |-----------|------------------|
 |   value   |     timestamp    |
 |-----------|------------------|
 |     2     |    12:02:34      | 
 |-----------|------------------|
 |     5     |    12:06:30      |
 |-----------|------------------|
 |     2     |    13:01:33      |
 |-----------|------------------|
 |     2     |    13:04:00      |

I want to get an extra column with the rolling average for the last hour only

 |-----------|------------------|--------|
 |   value   |     timestamp    |  avg   |
 |-----------|------------------|--------|
 |     2     |    12:02:34      |   2    |
 |-----------|------------------|--------|
 |     5     |    12:06:30      |  3.5   |
 |-----------|------------------|--------|
 |     2     |    13:01:33      |   3    |
 |-----------|------------------|--------|
 |     2     |    13:04:00      |   3    |

There can be a variable number of values for each hour, so I cannot do a Window over a number of rows, because i don't know how many rows it implies.

Any suggestions?

habarnam
  • 78
  • 6
  • Does this answer your question? [HIVE - compute statistics over partitions with window based on date](https://stackoverflow.com/questions/65479732/hive-compute-statistics-over-partitions-with-window-based-on-date) – mck Feb 03 '21 at 13:58

1 Answers1

1

This is a little tricky. Although Hive supports range window frames, it only supports them with numbers, not intervals.

So, you need to convert the timestamp to a number and then use that:

select t.*,
       avg(value) over (order by unix_timestamp(timestamp)
                        range between 3559 preceding and current row
                       )
from t;

3559 = 60 * 60 - 1, which is one second less than an hour. One second less is used because the window frame includes the current row.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • i ran it in impala, because of the speed, so now i get this error :AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. – habarnam Feb 03 '21 at 14:02
  • and this is the error in hive: Error while compiling statement: FAILED: ParseException line 137:40 cannot recognize input near '*' '60' '-' in windowframeboundary – habarnam Feb 03 '21 at 14:11
  • @habarnam . . . The question is tagged "hive" and this is a Hive solution. – Gordon Linoff Feb 03 '21 at 14:13
  • i know gordon,, sometimes the queries work on both. but this gives me an error in hive also – habarnam Feb 03 '21 at 14:15
  • ok i jusr replaced 60*60-1 with 3559 , it started running, hopefully i-ll get the end result – habarnam Feb 03 '21 at 14:18
  • 1
    @habarnam . . . Interesting. I didn't realize that constant expressions were not accepted there. I'll fix the answer. – Gordon Linoff Feb 03 '21 at 14:19