15
d = {'Dates':[pd.Timestamp('2013-01-02'),
              pd.Timestamp('2013-01-03'),
              pd.Timestamp('2013-01-04')],
     'Num1':[1,2,3],
     'Num2':[-1,-2,-3]}


df = DataFrame(data=d)  

We have this data frame

Dates                  Num1 Num2
0   2013-01-02 00:00:00  1  -1
1   2013-01-03 00:00:00  2  -2
2   2013-01-04 00:00:00  3  -3  

Dates    datetime64[ns]
Num1              int64
Num2              int64
dtype: object

This gives me

df['Dates'].isin([pd.Timestamp('2013-01-04')])  

0    False
1    False
2    False
Name: Dates, dtype: bool  

I am expecting a True for the date "2013-01-04", what am I missing? I using the latest 0.12 version of Pandas

DSM
  • 342,061
  • 65
  • 592
  • 494
DataByDavid
  • 1,039
  • 3
  • 13
  • 20

5 Answers5

3

This worked for me.

df['Dates'].isin(np.array([pd.Timestamp('2013-01-04')]).astype('datetime64[ns]')) 

I know that it is a bit verbose. But just in case you need to make it work this would help. Refer to https://github.com/pydata/pandas/issues/5021 for more details.

livinston
  • 1,218
  • 2
  • 12
  • 18
2

I have the same version of pandas, and @DSM's answer was helpful. Another workaround would be to use the apply method:

>>> df.Dates.apply(lambda date: date in [pd.Timestamp('2013-01-04')])

0    False
1    False
2     True
Name: Dates, dtype: bool
JoC
  • 21
  • 3
1

Yep, that looks like a bug to me. It comes down to this part of lib.ismember:

for i in range(n):
    val = util.get_value_at(arr, i)
    if val in values:
        result[i] = 1
    else: 
        result[i] = 0

val is a numpy.datetime64 object, and values is a set of Timestamp objects. Testing membership should work, but doesn't:

>>> import pandas as pd, numpy as np
>>> ts = pd.Timestamp('2013-01-04')
>>> ts
Timestamp('2013-01-04 00:00:00', tz=None)
>>> dt64 = np.datetime64(ts)
>>> dt64
numpy.datetime64('2013-01-03T19:00:00.000000-0500')
>>> dt64 == ts
True
>>> dt64 in [ts]
True
>>> dt64 in {ts}
False

I think usually that behaviour -- working in a list, not working in a set -- is due to something going wrong with __hash__:

>>> hash(dt64)
1357257600000000
>>> hash(ts)
-7276108168457487299

You can't do membership testing in a set if the hashes aren't the same. I can think of a few ways to fix this, but choosing the best one would depend upon design choices they made when implementing Timestamps that I'm not qualified to comment on.

DSM
  • 342,061
  • 65
  • 592
  • 494
  • What questions do you have about `Timestamp`? I might be able to answer them. I've just been working on that part of pandas. – Phillip Cloud Sep 28 '13 at 19:31
  • @PhillipCloud: how tightly coupled are Timestamps and datetime64 objects? Do they have the same underlying precision? Would it make sense to make sure they had the same hash, or would they need to be coerced to a canonical kind instead? – DSM Sep 28 '13 at 20:07
  • 1) They aren't coupled at all really, modulo some instance checks when comparing them. `Timestamp` is a subclass of `datetime.datetime`. 2) No. `Timestamp` only goes to nanoseconds. 3) Probably not. A `Timestamp` and a `datetime.datetime` have the same hash if a `Timestamp` has 0 for its `nanoseconds` field (`Timestamp` calls the `__hash__` method of `datetime.datetime` objects if this is the case). – Phillip Cloud Sep 28 '13 at 20:14
  • BTW you can get the `datetime64` version of a `Timestamp` via its `asm8` field. E.g., `Timestamp('now').asm8`. – Phillip Cloud Sep 28 '13 at 20:19
  • That's what I was afraid of, that there was no easy way to view one as a superset of the other: `somedt64 in {ts0, ts1}` is going to be a bit of a headache, then. – DSM Sep 28 '13 at 20:21
  • NumPy hasn't had sane datetime64 support throughout all of pandas history, otherwise `Timestamp` might've subclassed from it originally. – Phillip Cloud Sep 28 '13 at 20:25
  • In any case, I'll create an issue. Definitely worth some discussion. – Phillip Cloud Sep 28 '13 at 20:25
  • Re `list` vs `set` behavior: the [sequence docs](http://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) state that `in` uses equality comparison for `list` and friends. `set` compares hashes. (your example above is a nice illustration of this). – Phillip Cloud Sep 28 '13 at 20:31
1

For some reason whether your have "time" with your date, that sequence dont correctly. Try to:

df['Dates'] = df['Dates'].dt.normalize()
df['Dates'].isin([pd.Timestamp('2013-01-04')])  

You will lost the "time" from your "datetime", but if your time dont matter, it actually work :).

holydragon
  • 6,158
  • 6
  • 39
  • 62
Gabriel
  • 11
  • 1
0

I found using strings worked better in my case:

df['Dates'].isin(['2013-01-04'])
0    False
1    False
2     True
Name: Dates, dtype: bool
df_qry = df['Dates'][df['Num1']>=2]
1   2013-01-03
2   2013-01-04
Name: Dates, dtype: datetime64[ns]
df_mask = df['Dates'].isin(df_qry.astype(str))
0    False
1     True
2     True
Name: Dates, dtype: bool
df[df_mask]
    Dates   Num1    Num2
1   2013-01-03  2   -2
2   2013-01-04  3   -3

Just a side note: This was super handy for setting rangebreaks on plotly time series like:

fig.update_yaxes(rangebreaks=[dict(values=df.index[df_mask].astype(str))])
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61