0

I have the following two arrays of datetimes:

datesA:

        datesA
array([datetime.datetime(2000, 1, 4, 0, 0),
       datetime.datetime(2000, 1, 5, 0, 0),
       datetime.datetime(2000, 1, 6, 0, 0),
       datetime.datetime(2000, 1, 7, 0, 0),
       datetime.datetime(2000, 1, 8, 0, 0),
       datetime.datetime(2000, 1, 9, 0, 0),
       datetime.datetime(2000, 1, 10, 0, 0),
       datetime.datetime(2000, 1, 11, 0, 0),
       datetime.datetime(2000, 1, 12, 0, 0)], dtype=object)

And datesB:

datesB
array([datetime.datetime(2000, 1, 4, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 5, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 6, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 7, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 10, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 11, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 12, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 13, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2000, 1, 14, 0, 0, tzinfo=<UTC>)], dtype=object)

I want to find the dates in datesA that are NOT in datesB. Using ~isin() like below returns True for all rows instead of just the rows not in datesB:

datesA_not_in_datesB = ~np.isin(datesA,datesB)

datesA_not_in_datesB.reshape(-1,1)
array([[ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True]])

datesA rows 4 and 5 ( datetime.datetime(2000, 1, 8, 0, 0) and datetime.datetime(2000, 1, 9, 0, 0) ) are the only records that are not in datesB and that should return True.

I've found this issue of isin() not working for datetimes being reported in those posts:

The fix someone suggests in the posts above is:

datesA_not_in_datesB = ~np.isin(datesA.astype('datetime64[ns]'),datesB.astype('datetime64[ns]'))
​
C:\Users\Username\anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future
  """Entry point for launching an IPython kernel.

datesA_not_in_datesB.reshape(-1,1)
array([[False],
       [False],
       [False],
       [False],
       [ True],
       [ True],
       [False],
       [False],
       [False]])

That works except I get a warning message:

DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future """Entry point for launching an IPython kernel.

I have tried a few things to remove the timezone .replace(tzinfo=None) info from datesB to make isnan work without having to use .astype('datetime64[ns]') and find a solution without a DeprecationWarning but to no avail.

Would someone be able to advice on how to get the same result as

datesA_not_in_datesB = ~np.isin(datesA.astype('datetime64[ns]'),datesB.astype('datetime64[ns]'))

but in a way that doesn't result in a DeprecationWarning?

Thank you very much for your time and help with this.

noemiemich
  • 114
  • 1
  • 8
  • 1
    you attempted to use `replace` to change it to `tzinfo=None` or did you remove it completley to `datetime.datetime(2000, 1, 4, 0, 0)`? Sorry but it is a little unclear! :) – johnashu Apr 28 '20 at 13:18
  • 1
    try e.g. `datesB_mod = np.array([d.replace(tzinfo=None) for d in datesB])` and `~np.isin(datesA.astype('datetime64[ns]'),datesB_mod.astype('datetime64[ns]'))`will work fine... – FObersteiner Apr 28 '20 at 13:23
  • 1
    Thanks for your patience Jonash. I thought to remove it I had to apply `.replace(tzinfo=None)` . That's what I found on other posts like this one (https://stackoverflow.com/questions/41166093/remove-timezone-information-from-datetime-object/41166157). Could you please tell me how to remove the timezone info from datesB? Thank you very much for your time and help with this. – noemiemich Apr 28 '20 at 13:24
  • 1
    the point of the warning you get is "don't compare naive with aware datetime objects"... so you should not work with both of them at the same time - either localize all of them or keep them all naive, e.g. if you *know* all of them are in UTC. – FObersteiner Apr 28 '20 at 13:28
  • 1
    Thank you very much @MrFuppes that worked. Once the timzone info was removed as pointed out by Johnashu as well then I didn't get the deprecation warning. Thanks a lot. – noemiemich Apr 28 '20 at 13:29

1 Answers1

1

I removed the tzinfo=<UTC> from the datetime objects

This code gave me zero warnings..

Apologies if I missed the point!

import numpy as np
import datetime

from pytz import UTC

datesA = np.array([datetime.datetime(2000, 1, 4, 0, 0),
    datetime.datetime(2000, 1, 5, 0, 0),
    datetime.datetime(2000, 1, 6, 0, 0),
    datetime.datetime(2000, 1, 7, 0, 0),
    datetime.datetime(2000, 1, 8, 0, 0),
    datetime.datetime(2000, 1, 9, 0, 0),
    datetime.datetime(2000, 1, 10, 0, 0),
    datetime.datetime(2000, 1, 11, 0, 0),
    datetime.datetime(2000, 1, 12, 0, 0)], dtype=object)

datesB = np.array([datetime.datetime(2000, 1, 4, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 5, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 6, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 7, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 10, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 11, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 12, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 13, 0, 0, tzinfo=UTC),
    datetime.datetime(2000, 1, 14, 0, 0, tzinfo=UTC)], dtype=object)

datesB = np.array([d.replace(tzinfo=None) for d in datesB])

datesA_not_in_datesB = ~np.isin(datesA,datesB)

print(datesA_not_in_datesB)
>>> [False False False False  True  True False False False]

reshaped = datesA_not_in_datesB.reshape(-1,1)

print(reshaped)

>>> [[False]
 [False]
 [False]
[False]
[ True]
[ True]
[False]
[False]
[False]]
johnashu
  • 2,167
  • 4
  • 19
  • 44