0

My DF looks as follows:

date | customer_id | session_id | current_page | previous_page | ti_1
2016-02-14 16:37:00 | 9999 | ABC | STRING_A | STRING_B | STRING_C
2016-02-14 16:38:10 | 9999 | ABC | STRING_D | STRING_E | STRING_C
2016-02-14 16:38:40 | 9999 | ABC | STRING_F | STRING_G | STRING_C

I would like to always extract the last row (max(date)), per customer_id, session_id, ti_1. Hence, in my DF, I would like the answer to look like (as I am interested in "current_page" and "previous_page"):

2016-02-14 16:38:40 | 9999 | ABC | STRING_F | STRING_G | STRING_C

I have tried:

testi <- group_by(
  DF, customer_id,session_id, 
      current_page,previous_page,ti_1)

testi_2 <- summarise(testi,max(date))

However, this does evidently not work, as R groups by each individual string and I suspect that I have to do this using window functions. How is this done?

nrussell
  • 18,382
  • 4
  • 47
  • 60
koVex
  • 641
  • 1
  • 6
  • 10
  • You could try `group_by(DF, customer_id, session_id, ti_1) %>% slice(which.max(date))` – talat Feb 15 '16 at 21:53
  • Thx a lot @docendodiscimus, this works. Could you perhaps explain the logic behind "%>% slice(which.max(date))"? I get the group_by part, but don't understand the "%>%" operator, as well as the "slice(which.max(date))" – koVex Feb 15 '16 at 22:07
  • The `%>%` is a pipe operator and transfers the result of the group by (on the left) to the function on the right. You can read more about it in the introduction to dplyr. Also, you can type `?slice` and `?which.max` to learn more about those functions. – talat Feb 15 '16 at 22:12

0 Answers0