In Hive, pages that are repeated in the log data, want to separate and order sessions with only the time remaining when they first
ID Page Timestamp Orestes Login 152356 Orestes Login 152360 Orestes Account view 152368 Orestes Account view 152372 Orestes Transfer 152380 Orestes Account view 152382 Orestes Account view 152390 Orestes Loan 152393 Antigone Login 152382 Antigone Transfer 152390 Antigone Account view 152392 Antigone Account view 152395 Antigone Trust 152399
I'd like to change it as below.
ID Page Timestamp Sequence Orestes Login 152356 1 Orestes Account view 152368 2 Orestes Transfer 152380 3 Orestes Account view 152382 4 Orestes Loan 152393 5 Antigone Login 152382 1 Antigone Transfer 152390 2 Antigone Account view 152392 3 Antigone Trust 152399 4
Table script is...
insert into log values('Orestes','Login',152356)
insert into log values('Orestes','Login',152360)
insert into log values('Orestes','Account view',152368)
insert into log values('Orestes','Account view',152372)
insert into log values('Orestes','Transfer',152380)
insert into log values('Orestes','Account view',152382)
insert into log values('Orestes','Account view',152390)
insert into log values('Orestes','Loan',152393)
insert into log values('Antigone','Login',152382)
insert into log values('Antigone','Transfer',152390)
insert into log values('Antigone','Account view',152392)
insert into log values('Antigone','Account view',152395)
insert into log values('Antigone','Trust',152399)```
For this job,
With cte as
(
Select id, page, min(timestamp) timestamp from log group by id, page)
)
Select id, page, timestamp, rank() over (partition by id order by timestamp) from log
In this case, however, one of the account view of the orestes is lost. How can I solve this problem?