I have 100,000 rows of data in the following format:
import pandas as pd
data = {'ID': [1, 1, 3, 3, 4, 3, 4, 4, 4],
'timestamp': ['12/23/14 16:53', '12/23/14 17:00', '12/23/14 17:01', '12/23/14 17:02', '12/23/14 17:00', '12/23/14 17:06', '12/23/14 17:15', '12/23/14 17:16', '12/23/14 17:20']}
df = pd.DataFrame(data)
ID timestamp
0 1 2014-12-23 16:53:00
1 1 2014-12-23 17:00:00
2 3 2014-12-23 17:01:00
3 3 2014-12-23 17:02:00
4 4 2014-12-23 17:00:00
5 3 2014-12-23 17:06:00
6 4 2014-12-23 17:15:00
7 4 2014-12-23 17:16:00
8 4 2014-12-23 17:20:00
The ID
represents a user and the timestamp
is when that user visited a website. I want to get information about sessions using pandas, where each session on this site is a maximum of 15 mins long. A new session starts once the user has been logged on for 15 mins. For the above sample data, the desired result would be:
ID session_start session_duration
0 1 12/23/14 16:53. 7 min
1 3 12/23/14 17:02. 4 min
2 4 12/23/14 17:00. 15 min
3 4 12/23/14 17:16. 4 min
Let me know if there is information I should add to clarify. I can't seem to get a working solution. Any help is appreciated!
EDIT: In response to queries below, I noticed a mistake in my example. Sorry guys it was very late at night!
The problem that I am struggling with has mostly to do with user 4. They are still logged in after 15 minutes, and I want to capture in my data that a new session has started.
The reason that my problem is slightly different from this Groupby every 2 hours data of a dataframe is because I want to do this based on individual users.