I have the following dataframe :
date_time value member
2013-10-09 09:00:00 664639 Jerome
2013-10-09 09:05:00 197290 Hence
2013-10-09 09:10:00 470186 Ann
2013-10-09 09:15:00 181314 Mikka
2013-10-09 09:20:00 969427 Cristy
2013-10-09 09:25:00 261473 James
2013-10-09 09:30:00 003698 Oliver
and the second dataframe where I have the bounds like :
date_start date_end
2013-10-09 09:19:00 2013-10-09 09:25:00
2013-10-09 09:25:00 2013-10-09 09:40:00
so I need to create a new column where I will write the index of each interval between two datetime points:
smth like:
date_time value member session
2013-10-09 09:00:00 664639 Jerome 1
2013-10-09 09:05:00 197290 Hence 1
2013-10-09 09:10:00 470186 Ann 1
2013-10-09 09:15:00 181314 Mikka 2
2013-10-09 09:20:00 969427 Cristy 2
2013-10-09 09:25:00 261473 James 2
2013-10-09 09:30:00 003698 Oliver 2
the following code creates the column 'session'
, but doesn't write the index of session (i.e. index of row in bounds
dataframe) in 'session'
column, so don't separate the initial dataframe on intervals:
def create_interval():
df['session']=''
for index, row in bounds.iterrows():
s = row['date_start']
e = row['date_end']
mask=(df['date'] > s) & (df['date'] < e)
df.loc[mask]['session']='[index]'
return df
UPDATE
problem that code bounds['date_start'].searchsorted(df['date_time'])
doesn't give the result I want to obtain, i.e. one index value for every interval: df['Session']
= 1 for first interval, =2 for second and so on. Columns Session
is aimed to separate different intervals that lye in between date_start
and date_end
of bounds
I suppose that if df['date_time'] not the same that bounds['start_date'] it already increments index for session
, that not exactly what I'm looking for