3

I have a dataframe with DatetimeIndex and I want to find maximum elements for each window. But also I have to know indexes of elements. Example data:

data = pd.DataFrame(
    index=pd.date_range(start=pd.to_datetime('2010-10-10 12:00:00'),
                        periods=10, freq='H'),
    data={'value': [3, 2, 1, 0, 5, 1, 1, 1, 1, 1]}
)

If I use rolling with max, I loose indexes:

data.rolling(3).max()

Out:

                     value
2010-10-10 12:00:00    NaN
2010-10-10 13:00:00    NaN
2010-10-10 14:00:00    3.0
2010-10-10 15:00:00    2.0
2010-10-10 16:00:00    5.0
2010-10-10 17:00:00    5.0
2010-10-10 18:00:00    5.0
2010-10-10 19:00:00    1.0
2010-10-10 20:00:00    1.0
2010-10-10 21:00:00    1.0

If I try to use argmax I get indexes as Integer indexes in each window (but I have to find source datetime indexes or just integer indexes for source dataframe to be able to find them with iloc):

data.rolling(3).apply(lambda x: x.argmax())

Out:

                     value
2010-10-10 12:00:00    NaN
2010-10-10 13:00:00    NaN
2010-10-10 14:00:00    0.0
2010-10-10 15:00:00    0.0
2010-10-10 16:00:00    2.0
2010-10-10 17:00:00    1.0
2010-10-10 18:00:00    0.0
2010-10-10 19:00:00    0.0
2010-10-10 20:00:00    0.0
2010-10-10 21:00:00    0.0

Could anyone help me find in pandas good function/parameters for this?

Of course I can use for like:

pd.DataFrame([{'value_max': data[ind: ind + window][target_var].max(),
               'source_index': data[ind: ind + window].index[data[ind: ind + window][target_var].values.argmax()]
              } for ind in range(1, len(data) + 1 - window)],
             index=data.index[1:-window+1])

And it works. But I want try to find more elegant solution with pandas.

Desired output:

                           source_index  value_max
2010-10-10 13:00:00 2010-10-10 13:00:00          2
2010-10-10 14:00:00 2010-10-10 16:00:00          5
2010-10-10 15:00:00 2010-10-10 16:00:00          5
2010-10-10 16:00:00 2010-10-10 16:00:00          5
2010-10-10 17:00:00 2010-10-10 17:00:00          1
2010-10-10 18:00:00 2010-10-10 18:00:00          1
2010-10-10 19:00:00 2010-10-10 19:00:00          1

1 Answers1

4

Use Resampler.agg with custom function, because idxmax is not implemented for resampler yet:

def idx(x):
    return x.index.values[np.argmax(x.values)]

df = data['value'].rolling(3).agg(['max', idx])
df['idx'] = pd.to_datetime(df['idx'])
print (df)
                     max                 idx
2010-10-10 12:00:00  NaN                 NaT
2010-10-10 13:00:00  NaN                 NaT
2010-10-10 14:00:00  3.0 2010-10-10 12:00:00
2010-10-10 15:00:00  2.0 2010-10-10 13:00:00
2010-10-10 16:00:00  5.0 2010-10-10 16:00:00
2010-10-10 17:00:00  5.0 2010-10-10 16:00:00
2010-10-10 18:00:00  5.0 2010-10-10 16:00:00
2010-10-10 19:00:00  1.0 2010-10-10 17:00:00
2010-10-10 20:00:00  1.0 2010-10-10 18:00:00
2010-10-10 21:00:00  1.0 2010-10-10 19:00:00

Thank you, @Sandeep Kadapa for improve solution:

def idx(x):
    return x.idxmax().to_datetime64()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I'm sorry, maybe I writed unclear, I can find window indexes with argmax. But I have to find source DateTime indexes. (like in `for`, but I want to do it with pandas) – Anna Iliukovich-Strakovskaia Dec 28 '18 at 09:52
  • 1
    One small workaround for `idx` function is to use `return x.idxmax().to_datetime64()` – Space Impact Dec 28 '18 at 10:20
  • IIUC, I think `agg` function wants `numpy` datatypes as output, otherwise, it is failing to return output. In this case `x.idxmax()` is `pandas` datatype, whereas `x.idxmax().to_datetime64()` is `numpy` datatype and working fine. – Space Impact Dec 28 '18 at 10:30