I want to make a cumulative count using Hive SQL in recorrencia
column according to the other ones.
+------------+---------+-------+--------------+--+
| t.ano_mes | t.site | t.uf | recorrencia |
+------------+---------+-------+--------------+--+
| 202001 | 174 | AM | 1 |
| 202002 | 174 | AM | 1 |
| 202003 | 174 | AM | 1 |
| 202004 | 174 | AM | 1 |
| 202005 | 174 | AM | 1 |
| 202006 | 174 | AM | 1 |
| 202007 | 174 | AM | 1 |
| 202008 | 174 | AM | 1 |
| 202005 | 1JN | SP | 1 |
| 202006 | 1JN | SP | 1 |
| 202005 | 1LJ | SP | 1 |
| 202009 | 1LJ | SP | 1 |
| 202001 | 1RG | SP | 1 |
| 202002 | 1RG | SP | 1 |
| 202003 | 1RG | SP | 1 |
| 202004 | 1RG | SP | 1 |
| 202005 | 1RG | SP | 1 |
| 202006 | 1RG | SP | 1 |
| 202007 | 1RG | SP | 1 |
Desired output
+------------+---------+-------+--------------+--------+
| t.ano_mes | t.site | t.uf | recorrencia |cum_rec
+------------+---------+-------+--------------+--------+
| 202001 | 174 | AM | 1 |1
| 202002 | 174 | AM | 1 |2
| 202003 | 174 | AM | 1 |3
| 202004 | 174 | AM | 1 |4
| 202005 | 174 | AM | 1 |5
| 202006 | 174 | AM | 1 |6
| 202007 | 174 | AM | 1 |7
| 202008 | 174 | AM | 1 |8
| 202005 | 1JN | SP | 1 |1
| 202006 | 1JN | SP | 1 |2
| 202005 | 1LJ | SP | 1 |1
| 202009 | 1LJ | SP | 1 |2
| 202001 | 1RG | SP | 1 |1
| 202002 | 1RG | SP | 1 |2
| 202003 | 1RG | SP | 1 |3
| 202004 | 1RG | SP | 1 |4
| 202005 | 1RG | SP | 1 |5
| 202006 | 1RG | SP | 1 |6
| 202007 | 1RG | SP | 1 |7
I've tried a lot of functions like COUNT(*) OVER (t.ano_mes)
and COUNT(*) OVER (t.site)
but it runs the sum until the end of table, and do not restarts as the t.site
changes.
As soon as t.site
changes, the counter should restart.