1

I am running a very large function on a very large file (30 gigs). Because python is slow, I decided to try implement this function in numba. After initial reading it seems as if numba is very archaic in terms of manipulation ability with datetime, only allowing for manipulation on np.datetime64 objects and only looking at timedeltas and very basic np.datetime64 operations.

One of the columns in the file is a datetime object. One of the checks I need to run is to check if the day changed (which is defined as 5:00 pm in the timezone of the dataset), and perform operations if the day changed. Unfortunately, I have not found a clean solution where I can work on the numpy datetime64 object to perform this check, and was wondering if there was a way to do this.

Currently, the function takes in an integer array for year, month, week, weekday, day, hour, minute, and second, and this is how I am working with time in the numba function, very inefficient.

# What I have right now: 
@nb.jit
def check(hour): 
    for i in range(1, len(hour)-1): 
        if hour[i-1] == 4 and hour[i] == 5: 
              # run code
        else: 
              pass

# What I would Like (timestamp is a numpy datetime64 array): 
@nb.jit
def check(timestamp): 
   if hour(timestamp)[i-1] == 4 and hour(timestamp[i]) == 5: 
         # Run code
   else: 
        pass



Return the same thing that I am doing now without the function needing to use integer array variables.
  • tried the `pandas` or `numpy` solutions proposed in [this Q&A](https://stackoverflow.com/questions/13648774/get-year-month-or-day-from-numpy-datetime64)? – FObersteiner Jul 24 '19 at 14:22
  • I currently use pandas for my solution, but it is kind of odd to use multiple times (have to pull in an hour and date column), so I am trying to find a solution where I just use numpy, and I saw the numpy solution, but I have not gotten it to work. Very weird errors. Still trying. – Brendan Newell Jul 24 '19 at 14:37
  • I believe that this function does not work in conjunction with numba. The numpy solution. The pandas one definitely does not. – Brendan Newell Jul 24 '19 at 15:06
  • 1
    Yes, this function is not usable in numba, I need a function that I can use with an nb.jit wrapper or it will be too slow – Brendan Newell Jul 24 '19 at 15:41

1 Answers1

0

I think the basic rule of Numba is "don't use objects!"

You should do something like this and use as a 2D integer array. Do it outside of Numba.

dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])
year_month_days = np.stack([dates.year, dates.month, dates.day], axis=1)
Yu Kobayashi
  • 386
  • 3
  • 6