0

Pandas does not restrict DatetimeIndex keys to only Timestamps. Why it is so and is there any way to make such restriction?

df = pd.DataFrame({"A":{"2019-01-01":12.0,"2019-01-03":27.0,"2019-01-04":15.0},
                   "B":{"2019-01-01":25.0,"2019-01-03":27.0,"2019-01-04":27.0}}
                 )
df.index = pd.to_datetime(df.index)
df.loc['2010-05-05'] = 1 # string index
df.loc[150] = 1 # integer index
print(df)

I get the following dataframe:

                        A     B
2019-01-01 00:00:00  12.0  25.0
2019-01-03 00:00:00  27.0  27.0
2019-01-04 00:00:00  15.0  27.0
2010-05-05            1.0   1.0
150                   1.0   1.0

Of course I cannot do

df.index = pd.to_datetime(df.index)

once again because of last two rows. However I'd like if 2 last rows could not be added throwing an error. Is it possible?

viscacha
  • 25
  • 6

2 Answers2

2

You have a slight misconception about the type of your index. It is not a DateTimeIndex:

>>> df.index
Index([2019-01-01 00:00:00, 2019-01-03 00:00:00, 2019-01-04 00:00:00,
              '2010-05-05',                 150],
      dtype='object')

The index becomes an Object dtype index as soon as you add a different type value. DateTimeIndex's can't have types of than timestamps, the type of the index is changed.


If you would like to remove all values that are not datetimes from your index, you can do that with pd.to_datetime and errors='coerce'

df.index = pd.to_datetime(df.index, errors='coerce')

               A     B
2019-01-01  12.0  25.0
2019-01-03  27.0  27.0
2019-01-04  15.0  27.0
2010-05-05   1.0   1.0
NaT          1.0   1.0

To access only elements that have a valid Timestamp as index, you can use notnull:

df[df.index.notnull()]

               A     B
2019-01-01  12.0  25.0
2019-01-03  27.0  27.0
2019-01-04  15.0  27.0
2010-05-05   1.0   1.0
user3483203
  • 50,081
  • 9
  • 65
  • 94
0

You can check if each index is a pd._libs.tslibs.timestamps.Timestamp instance:

flags = [isinstance(idx, pd._libs.tslibs.timestamps.Timestamp) for idx in df.reset_index()['index']]
df = df[flags]

However, note that you can certainly do both pd.to_datetime('2010-05-05') and pd.to_datetime(150). At least, they still result in valid datetime stamp without throwing an exception/error/

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • It's interesting though pd.to_datetime(150) works, in non-scalar content (like my example) it can throw error because of mixed integers and timestamps – viscacha Mar 28 '19 at 20:57
  • Internally datetime stamp is int64, which I believe is the number of nanoseconds from epoch time. – Quang Hoang Mar 28 '19 at 21:00