How to avoid nested 'for loops' in Python ?

Question

I have dataframe (df) of the form :

        SERV_OR_IOR_ID  IMP_START_TIME   IMP_CLR_TIME     TIME_BIN
0         -1447310116  23:59:32.873000  00:11:28.755000
1          1673545041  00:00:09.182000  00:01:06.912000
2          -743717696  23:59:57.312000  00:00:32.428000
3          -746373244  23:59:57.915000  00:05:33.232000

I am dividing the 24hrs of a day into 'Time Bins' of 30 minutes, so the zeroth time bin would be from 00:00-00:30, first from 00:30-01:00 and so on.Post this, I want to assign a value in the 'TIME_BIN' column for each row depending on which time window does the 'IMP_START_TIME' fall in. For example : For the row '1', I would assign the value '0' since it falls in the time window '00:00'-00:30.

For this, I wrote the following code :

interval = dt.timedelta(minutes=30)
start = dt.time(0,0,0)

grid =[(dt.datetime.combine(dt.date(1,1,1),start)+n*interval).time() for n in range(48)]


for j in range(len(df)):         
    for i in range(0,47):
        if df.ix[j,1]  <grid[i+1] and df.ix[j,1]  > grid[i]:
            df.ix[j,3]  = i

        elif  df.ix[j,1]  > grid[47]:
            df.ix[j,3]  = 47

This code takes a lot of time to run because of the nested for loops. Is there a more efficient way to do the same thing ?

use [zip](https://docs.python.org/2/library/functions.html#zip) to minimize indentation levels — Azat Ibrakov, Apr 24 '17 at 09:01
Can't you just add up the minutes and the hours of each line in the dataframe and then divide it by 30? 1:23 -> 83 -> 83/30 -> 2 — Daniel Frühauf, Apr 24 '17 at 09:03

zipa · Answer 1 · 2017-04-24T10:22:32.337

1

This should be faster:

index = pd.date_range('1/1/2000', periods=48, freq='30T').time
index = {v: i for i, v in enumerate(index)}
df['TIME_BIN'] = pd.to_datetime(df['IMP_CLR_TIME']).dt.floor('30T').dt.time.map(index)

edited Apr 24 '17 at 10:22

answered Apr 24 '17 at 09:18

zipa

27,316
6
40
58

score 1 · Answer 2 · answered Apr 24 '17 at 10:20

1

You can calculate required bin, instead of lookup. It shall save a lot:

def halfhour_bin(time):
    return ((time.hour*60)+time.minute)//30

for j in range(len(df)):        
    df.ix[j,3] = halfhour_bin(df.ix[j,1])

answered Apr 24 '17 at 10:20

Ariksu

83
1
9

How to avoid nested 'for loops' in Python ?

2 Answers2