1

In Hive, pages that are repeated in the log data, want to separate and order sessions with only the time remaining when they first

ID          Page            Timestamp
Orestes     Login           152356
Orestes     Login           152360
Orestes     Account view    152368
Orestes     Account view    152372
Orestes     Transfer        152380
Orestes     Account view    152382
Orestes     Account view    152390
Orestes     Loan            152393
Antigone    Login           152382
Antigone    Transfer        152390
Antigone    Account view    152392
Antigone    Account view    152395
Antigone    Trust           152399

I'd like to change it as below.

ID          Page            Timestamp   Sequence
Orestes     Login           152356      1
Orestes     Account view    152368      2
Orestes     Transfer        152380      3
Orestes     Account view    152382      4
Orestes     Loan            152393      5
Antigone    Login           152382      1
Antigone    Transfer        152390      2
Antigone    Account view    152392      3
Antigone    Trust           152399      4

Table script is...

insert into log values('Orestes','Login',152356)
insert into log values('Orestes','Login',152360)
insert into log values('Orestes','Account view',152368)
insert into log values('Orestes','Account view',152372)
insert into log values('Orestes','Transfer',152380)
insert into log values('Orestes','Account view',152382)
insert into log values('Orestes','Account view',152390)
insert into log values('Orestes','Loan',152393)
insert into log values('Antigone','Login',152382)
insert into log values('Antigone','Transfer',152390)
insert into log values('Antigone','Account view',152392)
insert into log values('Antigone','Account view',152395)
insert into log values('Antigone','Trust',152399)```

For this job,

With cte as
(
Select id, page, min(timestamp) timestamp from log group by id, page)
)
Select id, page, timestamp, rank() over (partition by id order by timestamp) from log

In this case, however, one of the account view of the orestes is lost. How can I solve this problem?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
orestes
  • 25
  • 4

1 Answers1

0

Using LAG you can find previous page and filter if it is repeated.

with log as (
select stack (13,
'Orestes','Login',           152356,
'Orestes','Login',           152360,
'Orestes','Account view',    152368,
'Orestes','Account view',    152372,
'Orestes','Transfer',        152380,
'Orestes','Account view',    152382,
'Orestes','Account view',    152390,
'Orestes','Loan',            152393,
'Antigone','Login',           152382,
'Antigone','Transfer',        152390,
'Antigone','Account view',    152392,
'Antigone','Account view',    152395,
'Antigone','Trust',           152399
) as (ID,Page,Timestamp) 
)

select id, page, timestamp, row_number() over(partition by id order by timestamp) sequence
from
(
select id, page, timestamp, lag(page) over(partition by id order by timestamp) prev_page
  from log
)s 
where (prev_page!=page) or (prev_page is null)
;

Result:

OK
Antigone        Login   152382  1
Antigone        Transfer        152390  2
Antigone        Account view    152392  3
Antigone        Trust   152399  4
Orestes Login   152356  1
Orestes Account view    152368  2
Orestes Transfer        152380  3
Orestes Account view    152382  4
Orestes Loan    152393  5
Time taken: 9.359 seconds, Fetched: 9 row(s)
leftjoin
  • 36,950
  • 8
  • 57
  • 116