1

I am a python and data training beginner. Currently working on a dummy taxi fare calculator data frame but to receive better results, I want to separate day-time and night-time from each other to calculate a better fare.

The code I currently have:

d['time'] = pd.to_datetime(d['start']).dt.strftime('%H:%M')

for time in d['time']:
    hourMin = time.split(":")
    hour = int(hourMin[0])
    mins = int(hourMin[1])
    if hour >= 6 and hour <= 20:
        if(hour == 18):
            if(mins > 0):
                dtime = '0'
            else:
                dtime = '1'
        else:
            dtime = '1'
    else:
        day_time = '0'
        
    dtime[:10]
    d['time'] = dtime
    
d.head()

When I run this, I receive IndexError: list index out of range error on this part mins = int(hourMin[1]) .

I really appreciate any help I can get since I have been struggling on this for last 4-5 hours.

aiox
  • 13
  • 4
  • Regarding day/night logic, [this answer](https://stackoverflow.com/a/64483571/6340496) might help with how you might simplify the logic; specifically the 'Example' section. – S3DEV Nov 19 '20 at 21:30
  • Add a print(hourMin) statement before the error could help you to understant what is going on. – manu190466 Nov 19 '20 at 21:31
  • @manu190466 since it can't process further it prints `['0']` – aiox Nov 19 '20 at 21:38
  • @S3DEV I had a look into it, I think I am getting to logic but I am way too confused to understand this kind of a complex script right now – aiox Nov 19 '20 at 21:40

2 Answers2

0

I think you make it too complicated. You can compare the .time of the column with time objects:

(
    (time(6) <= di.time) & (di.time <= time(18))
) | (
    (di.time >= time(19)) & (di.time < time(21))
)

This thus checks if the time is between 6:00 and 18:00 (both inclusive), or 19:00 (inclusive) and 21:00 (exclusive).

This will also work more efficient. For example:

>>> di
DatetimeIndex(['2015-04-24 02:00:00', '2015-11-26 23:00:00',
               '2016-01-18 00:00:00', '2016-06-27 22:00:00',
               '2016-08-12 17:00:00', '2016-10-21 11:00:00',
               '2016-11-07 11:00:00', '2016-12-09 23:00:00',
               '2017-02-20 01:00:00', '2017-06-17 18:00:00'],
              dtype='datetime64[ns]', freq=None)
>>> ((time(6) <= di.time) & (di.time <= time(18))) | ((di.time >= time(19)) & (di.time < time(21)))
array([False, False, False, False,  True,  True,  True, False, False,
        True])

You can convert it to an int with .astype(int):

>>> (((time(6) <= di.time) & (di.time <= time(18))) | ((di.time >= time(19)) & (di.time < time(21)))).astype(int)
array([0, 0, 0, 0, 1, 1, 1, 0, 0, 1])
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Thank you for your reply, but I am quite unsure about how can I apply this to my code. Can you please clarify a bit more? – aiox Nov 19 '20 at 23:06
  • `di = pd.to_datetime(d['start'])`, then you set it as `d['time'] = (((time(6) <= di.time) & (di.time <= time(18))) | ((di.time >= time(19)) & (di.time < time(21)))).astype(int)`. – Willem Van Onsem Nov 19 '20 at 23:08
  • Unfortunately receiving this error. `TypeError: 'str' object is not callable` – aiox Nov 19 '20 at 23:26
  • @aiox: but we do not call anything. You did import `from datetime import time`? – Willem Van Onsem Nov 19 '20 at 23:27
  • @aiox: furthermore any local variable named `time` should be renamed... – Willem Van Onsem Nov 19 '20 at 23:28
  • I checked all the previous lines. No `time`. I had it imported, I renamed it to test this but still no work. `di = pd.to_datetime(d['start']) d['time'] = (((time(6) <= di.time) & (di.time <= time(18))) | ((di.time >= time(19)) & (di.time < time(21)))).astype(int)` Error : `Series' object has no attribute 'time'` – aiox Nov 19 '20 at 23:42
  • @aiox: ah, but then it should be `di = pd.to_datetime(d['start']).dt` (so with `.dt`). – Willem Van Onsem Nov 19 '20 at 23:44
0

Pandas has some promising sounding functions: pandas.DataFrame.between_time and pandas.DatetimeIndex.indexer_between_time

Unfortunately, between_time returns a DataFrame, and not a boolean Series. So it's not so convenient to use with .loc.

And indexer_between_time returns an integer array of index positions. Works with .iloc, but that is quite uncomfortable to us.

Furthermore both of them require the index to be of DateTime type.

First, some example data:

df = pd.DataFrame(pd.date_range(start = "2020-11-19 00:00",
                  end = "2020-11-19 23:59",
                  periods = 15),
                  columns = ["start"])
                 start
0  2020-11-19 00:00:00
1  2020-11-19 01:42:47
2  2020-11-19 03:25:34
3  2020-11-19 05:08:21
4  2020-11-19 06:51:08
5  2020-11-19 08:33:55
6  2020-11-19 10:16:42
7  2020-11-19 11:59:30
8  2020-11-19 13:42:17
9  2020-11-19 15:25:04
10 2020-11-19 17:07:51
11 2020-11-19 18:50:38
12 2020-11-19 20:33:25
13 2020-11-19 22:16:12
14 2020-11-19 23:59:00

Adding a new column, which will show with True / False if a row is Daytime or not.

df["Daytime"] = False

Setting the index to start, the DateTime column:

df = df.set_index("start")
                    Daytime
start                      
2020-11-19 00:00:00   False
2020-11-19 01:42:47   False
2020-11-19 03:25:34   False
2020-11-19 05:08:21   False
2020-11-19 06:51:08   False
2020-11-19 08:33:55   False
2020-11-19 10:16:42   False
2020-11-19 11:59:30   False
2020-11-19 13:42:17   False
2020-11-19 15:25:04   False
2020-11-19 17:07:51   False
2020-11-19 18:50:38   False
2020-11-19 20:33:25   False
2020-11-19 22:16:12   False
2020-11-19 23:59:00   False

Which are your boundaries for a timestamp to be DayTime?

DayStart = "06:30:00"
DayEnd = "18:00:00"

Creating the integer array of matching rows. You can also set include_start and include_end to have open or closed intervals.

DayTime = df.index.indexer_between_time(DayStart, DayEnd)

What do we get in return? A list of integers that match the index position.

>>> array([ 4,  5,  6,  7,  8,  9, 10])

We can now use that to set the 0th column to True:

df.iloc[DayTime,0] = True
                   Daytime
start                     
2020-11-19 00:00:00  False
2020-11-19 01:42:47  False
2020-11-19 03:25:34  False
2020-11-19 05:08:21  False
2020-11-19 06:51:08   True
2020-11-19 08:33:55   True
2020-11-19 10:16:42   True
2020-11-19 11:59:30   True
2020-11-19 13:42:17   True
2020-11-19 15:25:04   True
2020-11-19 17:07:51   True
2020-11-19 18:50:38  False
2020-11-19 20:33:25  False
2020-11-19 22:16:12  False
2020-11-19 23:59:00  False

Using the between_time function returns a DataFrame matching the criterion:

df_DayFilter = df.between_time(DayStart, DayEnd)
                   Daytime
start                     
2020-11-19 06:51:08   True
2020-11-19 08:33:55   True
2020-11-19 10:16:42   True
2020-11-19 11:59:30   True
2020-11-19 13:42:17   True
2020-11-19 15:25:04   True
2020-11-19 17:07:51   True

I'd really like to know as well if there is a more elegant way to use between_time!

flom
  • 71
  • 2
  • Hi, thank you for your broad explanation and example. That helped me a lot to understand the logic but I have a question. Since I have a data frame, `df = pd.DataFrame(pd.date_range(start = "2020-11-19 00:00", end = "2020-11-19 23:59",periods = 15), columns = ["start"])` how can I implement it to this part? – aiox Nov 19 '20 at 23:08
  • The quoted block of my code is simply to create a minimum sample DataFrame. Something to allow my answer to be tried out immediately. I don't know the structure of the DataFrame you're working with, so I created the bare minimum: just the datetime column named "start". If you add more information about the layout of your DataFrame, I can edit my answer accordingly. – flom Nov 20 '20 at 06:38