How do I find values (userIDs), which occur frequently together, based on a timestamp?
My question is linked to this question: Session generation from log file analysis with pandas - however, my data is already sessionized, I want to go a step further and find users which login at the same time, which means that 'sessionBegin' is close by.
Sure we have to set a granularity, let us assume that users which have a 'sessionBegin' lower than 30 Minutes apart logined at the same time.
# my data (a series with level-2 index):
sessionBegin
userID sessionID
A 1 2014-5-7 14:15
A 2 2014-5-8 16:30
B 3 2014-5-7 20:33
C 4 2014-5-7 14:20
C 5 2014-5-7 18:58
C 5 2014-5-8 16:30
D 6 2014-5-7 15:01
D 6 2014-5-8 12:04
In this example there clearly is a co-occurrence (statistical dependence?) between userID A
and C
.
I was thinking of setting the timestamp as index and use a rolling-window of the size 30 mins, but I did now know how to recognize re-occurring sets of userIDs. Is it possible to recognize not only pairs of userIDs but also larger sets?