I'm trying to find overlap between two members to see if they know each other. I also have a minimum overlap required(i.e. they need to know each-other for at least two months to form a group).
Example Input DF
time_together = 5184000 (60 days)
person_name start_date end_date cut_off (start + time_together)
sally 1540627200 1545638400 1545811200
john 1543046400 1548316800 1548230400
edgar 1548316800 1553414400 1553500800
I currently have start date and end date in unix timestamps in a pandas data frame. I've calculated a cut off time that is the start time + minimum duration. I then check every persons attendance against the cutoff, if it is less than I say they will form a valid group(see code below)
df_new = pd.DataFrame()
for i in range(len(df.index)):
start_range = (df.loc[i,'cutoff'] - df['start_timestamp'] > 0)
end_range = (df.loc[i,'cutoff'] < df['end_timestamp'])
df_new['%s%s' % (df.loc[i,'Soldier_SSN'],i)] = start_range & end_range
The problem is I now have a matrix of bools, and I need to generate an output that has the groups name. (see below for ideal output).
Current Output DF:
sally john edgar
0 True True False
1 True True False
2 False False False
Because sally and john have been together for the minimum time. They would form a group, but edgar hasn't.
The output would ideally be a list of lists [[person1, person2, person5], [person3, person4]]
It's also hella slow, so any suggestions on how to speed this up would be great.