0

I have the following table:

User ID Session ID Time Stamp Page
123 123.4 HH:MM:01 1
123 123.4 HH:MM:02 2
123 123.4 HH:MM:05 3
123 123.4 HH:MM:10 4
123 123.4 HH:MM:11 5
122 1299.1 HH:MM:01 2
122 1299.1 HH:MM:02 3
128 124.4 HH:MM:01 1
128 124.4 HH:MM:02 2

And, I am trying to get a table like this:

User ID Session ID Time Stamp Page Next Page Duration (secs)
123 123.4 HH:MM:01 1 2 1
123 123.4 HH:MM:02 2 3 3
123 123.4 HH:MM:05 3 4 5
123 123.4 HH:MM:10 4 5 1
122 1299.1 HH:MM:01 2 3 1
128 124.4 HH:MM:01 1 2 1

I know I have to use lead(), but I cannot figure out how to use the mutate and lead function together.

Any help is much appreciated. Thank you.

user2845095
  • 465
  • 2
  • 9
  • 1
    What's the `Duration (sec)` variable calculating? – guasi Jun 17 '22 at 14:43
  • @guasi, It is duration in seconds. So, I am calculating the difference between the below and above timestamps. – user2845095 Jun 17 '22 at 14:58
  • Your numbers on `Duration (sec)` are not the result of the difference between **below** and **above** timestamps. In row 3, that calculation would be 10 (below) - 2 (above) = 8. It seems you want self and below, which would give you 2-1 = 1, 5-3 = 3, 10-5 = 5, 10 - nothing = 10. In this scenario, `Duration` for row 4 should be 10, not 1. Or how are you calculating the result of row 4? – guasi Jun 17 '22 at 19:05
  • I assumed in my previous question you are calculating duration between timestamps in a **single session**. Is that the case? – guasi Jun 17 '22 at 19:11
  • @guasi, it is by user id and session id too. – user2845095 Jun 17 '22 at 19:12
  • So how are you calculating Duration for row 4? see my question above? – guasi Jun 17 '22 at 19:23
  • @guasi, for user 123 and session id 123.4, the duration is 2-1 = 1, 5-2 = 3, 10-5 = 5, and 11-10 = 1. – user2845095 Jun 18 '22 at 11:00
  • I tried using this: df <- df %>% select (timestamp, sessionid, userid, page) %>% arrange(timestamp, sessionid) %>% mutate (nextpage = lead (page)) But, I know this is incomplete. I need help figuring the rest out. – user2845095 Jun 18 '22 at 13:26

1 Answers1

0

Here is a possibility using dplyr. It's not very elegant, perhaps someone else can come up with something more succinct.

library(dplyr)

df <- tribble(
~ ID, ~ Session, ~ Timestamp, ~ Pages,
123, 123.4,  01, 1,
123, 123.4,  02, 2,
123, 123.4,  05, 3,
123, 123.4,  10, 4,
123, 123.4,  11, 5,
122, 1299.1, 01, 2,
122, 1299.1, 02, 3,
128, 124.4,  01, 1,
128, 124.4,  02, 2)


df %>% 
  group_by(ID, Session) %>% 
  mutate(Duration = lead(Timestamp) - Timestamp) %>% 
  na.omit()

-output

# A tibble: 6 × 5
     ID Session Timestamp Pages Duration
  <dbl>   <dbl>     <dbl> <dbl>    <dbl>
1   123    123.         1     1        1
2   123    123.         2     2        3
3   123    123.         5     3        5
4   123    123.        10     4        1
5   122   1299.         1     2        1
6   128    124.         1     1        1
guasi
  • 1,461
  • 3
  • 12