I have a time series data in Impala that in this format.
One record get created when and only when there is a change, updated value represents the new data.
---------------------------------------
| Product | Year | Week | UpdatedValue |
---------------------------------------
|A | 2017 | 1 | 5 |
|A | 2017 | 5 | 10 |
|A | 2017 | 20 | 80 |
|B | 2017 | 8 | 90 |
|... | ... | ... | ... |
---------------------------------------
Assuming that our time window is 2017 full year from week one to week 52. The data above says that the value got changed for product A at the first week to be 5, 5'th week to be 10 and 20th week to be 80. I want to fill in the missing value by using the last observation carry forward logic, and if the data doesn't start from the first week, then also fill in the leading missing values with the next occurrence.
This should be the ideal output.
---------------------------------------
| Product | Year | Week | UpdatedValue |
---------------------------------------
|A | 2017 | 1 | 5 |
|A | 2017 | . | 5 |
|A | 2017 | 4 | 5 |
|A | 2017 | 5 | 10 |
|A | 2017 | 6 | 10 |
|A | 2017 | . | 10 |
|A | 2017 | 20 | 80 |
|A | 2017 | . | 80 |
|A | 2017 | 52 | 80 |
|B | 2017 | 1 | 90 |
|B | 2017 | . | 90 |
|B | 2017 | 8 | 90 |
|B | 2017 | . | 90 |
|B | 2017 | 52 | 90 |
|... | ... | ... | ... |
---------------------------------------
Is there a feasible solution Hive/Impala you can certain use their advanced analytics functions if that is easier? However, if a generic SQL solution is available, that will be great.