0

I'm trying to deploy a custom function using apply on a resampled object. The tricky part in the function is that it loops across each timestamp of the passed dataframe and performs operations based on values of other columns for that timestamp. Then it would output a dataframe of the same rowcount as in the input one (which in my toy example I'm not doing, just returning a list). The logic in the example I provide is much simpler than in my use-case.

Getting an IndexingError: Too many indexers

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': np.random.randint(0, 100, 10), 'b': np.random.randint(0, 1000, 10), 'c': np.random.uniform(0, 100, 10)},
index = pd.date_range("2021-01-01", "2021-01-10"))

def test_func(df):
    new_ser = []
    for i in range(df.shape[0]):
        if i==0:
            new_ser.append(np.NaN)
        if df.iloc[i,:]['a'] < df.iloc[i,:]['b']:
            new_ser.append(1)
        else:
            new_ser.append(0)

    return new_ser

df.resample('2D').apply(test_func)
IndexingError: Too many indexers
matsuo_basho
  • 2,833
  • 8
  • 26
  • 47

1 Answers1

0

Problem is df.iloc[i,:]['a'] in Resampler.apply, value passed into Resampler.apply is resampled column of original dataframe like

2021-01-01    81
2021-01-02    90
Freq: D, Name: a, dtype: int64

2021-01-01    395
2021-01-02    845
Freq: D, Name: b, dtype: int64

You may want groupby(pd.Grouper).apply()

def test_func(df):
    new_ser = []
    for i in range(df.shape[0]):
        if i==0:
            new_ser.append(np.NaN)
        if df.iloc[i,:]['a'] < df.iloc[i,:]['b']:
            new_ser.append(1)
        else:
            new_ser.append(0)
    return new_ser

out = df.groupby(pd.Grouper(freq='2D')).apply(test_func)
print(out)


2021-01-01    [nan, 1, 1]
2021-01-03    [nan, 1, 1]
2021-01-05    [nan, 1, 1]
2021-01-07    [nan, 1, 1]
2021-01-09    [nan, 1, 1]
Freq: 2D, dtype: object
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Your solution works, but I don't understand why the resample approach doesn't. If I iterate through the groups on a resample object, the object is a dataframe. – matsuo_basho May 26 '22 at 15:34
  • @matsuo_basho Iterate and apply are different – Ynjxsjmh May 26 '22 at 15:36
  • Ok, so I think you're saying the object that's passed to the apply function in my original version is just the first column? And this happens when resample + apply are used? – matsuo_basho May 26 '22 at 15:52
  • @matsuo_basho Not just the first column but column by column. – Ynjxsjmh May 26 '22 at 15:53