0

I am having troubles indexing a pandas dataframe with pandas.DatetimeIndex index.

The problem arises when i try to index the dataframe with a list of labels, using .loc accessor (on the contrary, indexing with a list of indexes through .iloc works).

Here is the code to reproduce the issue:

from __future__ import print_function
import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd
import numpy as np

data = StringIO("""
timestamp,value
2014-04-02 14:29:00,42.652
2014-04-02 14:34:00,41.361
2014-04-02 14:39:00,-68.408
2014-04-02 14:44:00,40.262
2014-04-02 14:59:00,-89.836
2014-04-02 15:04:00,42.579
""")

anomalies = ['2014-04-02 14:39:00', '2014-04-02 14:59:00']

df = pd.read_csv(data, parse_dates=['timestamp'], index_col='timestamp')

# Works
print("1)")
print(df.loc[anomalies[0]])
print(df.loc[anomalies[1]])

# Works
print("\n2)")
anomalies_indexes = [np.argwhere(df.index == a).item() for a in anomalies]
print(anomalies_indexes)  # prints [2, 4]
print(df.iloc[anomalies_indexes, :])

# Does not work -> throws KeyError
print("\n3)")
print(df.loc[anomalies, :])

I am using Python 3.7.2 and pandas 0.23.4 on my machine, but the same behavior happens on Repl.it with Python 3.7.4 and pandas 0.25.1 (try this Repl.it) and with Python 2.7.16 and pandas 0.24.2 (try this Repl.it), which are the default versions for Python 2 and 3 environments on Repl.it at the time of writing.

Can you spot any error in my code or tell me what I am missing?

[EDIT: Answer]

The solution is to convert the strings in datetime objects, as suggested in the comments (thanks to user @anky_91):

anomalies = [pd.to_datetime(a) for a in anomalies]
print(df.loc[anomalies, :])  # Now this works
IvanProsperi94
  • 109
  • 2
  • 7
  • 2
    that is because the list `anomalies` has string entries but the `df.index` is a datetimeindex, try `df.loc[[pd.to_datetime(i) for i in anomalies]]` to see that works – anky Sep 07 '19 at 12:44
  • 1
    @anky_91 yes that was the problem. I can't say I did not think about it, but as indexing with a single string worked, I expected it to work with a list, so I did not try.. Shame on me. And thank you guys for the fast answer. – IvanProsperi94 Sep 07 '19 at 12:51
  • 1
    `pd.to_datetime(anomalies)` would work as well. – Quang Hoang Sep 07 '19 at 13:48

0 Answers0