2

Hi have a dataframe with datetimeindex showing latest first and oldest last. When using .resample.agg the order of the index turns around. I couldn't read anything in the docs. Why is that and how can I avoid/change it? Thank you.

Here I create an example:

index = pd.date_range(start='2021-06-20',end='2021-07-31',freq='1min')
index = index.sort_values(ascending=False)
randval = np.random.randint(10,50,size=(len(index),2),dtype=np.uint8)

df = pd.DataFrame(randval,index=index,columns=['min','max'])
df_filt = df.between_time('10:00','12:00')
df_resampled = df_filt.resample('10T').agg({'min':'sum','max':'mean'})
df_resampled = df_resampled.dropna()


print('Numpy Version: ',np.version.version,)
print('Pandas Version: ',pd.__version__,'\n')
print('RAW DATAFRAME:\n ',df,'\n')
print('FILTERED DATAFRAME:\n',df_filt,'\n')
print('RESAMPLED DATAFRAME WITH CHANGED ORDER:\n',df_resampled,)

I get the output:

Numpy Version:  1.21.1
Pandas Version:  1.3.0 

RAW DATAFRAME:
                      min  max
2021-07-31 00:00:00   20   33
2021-07-30 23:59:00   23   29
2021-07-30 23:58:00   18   45
...                  ...  ...
2021-06-20 00:02:00   48   28
2021-06-20 00:01:00   19   19
2021-06-20 00:00:00   33   16

[59041 rows x 2 columns] 

FILTERED DATAFRAME:
                     min  max
2021-07-30 10:30:00   14   38
2021-07-30 10:29:00   27   46
2021-07-30 10:28:00   41   17
...                  ...  ...
2021-06-20 10:02:00   30   12
2021-06-20 10:01:00   43   49
2021-06-20 10:00:00   43   22

[1271 rows x 2 columns] 

RESAMPLED DATAFRAME WITH CHANGED ORDER:
                       min   max
2021-06-20 10:00:00  366.0  27.9
2021-06-20 10:10:00  264.0  29.9
2021-06-20 10:20:00  236.0  34.6
...                    ...   ...
2021-07-30 10:10:00  310.0  32.0
2021-07-30 10:20:00  343.0  35.7
2021-07-30 10:30:00   14.0  38.0

[164 rows x 2 columns]


EDIT


I add a .sort_index to df_resample. This seems to work. For me still a bit weird to do it this way though.

df_resampled = df_resampled.sort_index(ascending=False)

# or all combined:

df_resampled = df_filt.resample('10T').agg({'min':'sum','max':'mean'}).dropna().sort_index(ascending=False)

gives this output:

RESAMPLED DATAFRAME:
                        min   max
2021-07-30 10:30:00   34.0  45.0
2021-07-30 10:20:00  321.0  29.1
2021-07-30 10:10:00  315.0  33.6
2021-07-30 10:00:00  326.0  30.8
2021-07-29 10:30:00   13.0  24.0
2021-07-29 10:20:00  277.0  35.5
2021-07-29 10:10:00  350.0  25.2
2021-07-29 10:00:00  285.0  28.6
2021-07-28 10:30:00   23.0  12.0
2021-07-28 10:20:00  279.0  24.9
2021-07-28 10:10:00  254.0  30.3
2021-07-28 10:00:00  352.0  34.8
2021-07-27 10:30:00   33.0  14.0
2021-07-27 10:20:00  309.0  21.2
2021-07-27 10:10:00  273.0  30.5
2021-07-27 10:00:00  340.0  30.0
2021-07-26 10:30:00   34.0  14.0
2021-07-26 10:20:00  334.0  30.9
2021-07-26 10:10:00  261.0  20.1
2021-07-26 10:00:00  284.0  29.5

Hank Gordon
  • 127
  • 1
  • 9
  • maybe this can help https://stackoverflow.com/questions/57093810/how-to-do-backward-resampling-on-time-series-data-starting-from-the-last-row – Joe Aug 06 '21 at 11:51
  • 1
    @Joe hmm thanks for your input. After some search I add a .sort_index(ascending=False) after resampling. Still a little weird to me but it works – Hank Gordon Aug 06 '21 at 12:42
  • This could be unintended behaviour (and possibly considered a bug), maybe ask on the Pandas issue tracker? – suvayu Aug 06 '21 at 13:08

0 Answers0