Pandas: Resample dataframe column, get discrete feature that corresponds to max value

Question

Sample data:

import pandas as pd
import numpy as np
import datetime

data = {'value': [1,2,4,3], 'names': ['joe', 'bob', 'joe', 'bob']}
start, end = datetime.datetime(2015, 1, 1), datetime.datetime(2015, 1, 4)
test = pd.DataFrame(data=data, index=pd.DatetimeIndex(start=start, end=end, 
       freq="D"), columns=["value", "names"])

gives:

          value names
2015-01-01  1   joe
2015-01-02  2   bob
2015-01-03  4   joe
2015-01-04  3   bob

I want to resample by '2D' and get the max value, something like:

df.resample('2D')

The expected result should be:

          value names
 2015-01-01 2   bob
 2015-01-03 4   joe

Can anyone help me?

I've updated my answer if you're interested. – piRSquared Jun 27 '17 at 22:13 — piRSquared, Jun 27 '17 at 22:13

score 6 · Accepted Answer · answered Jun 27 '17 at 21:03

6

You can resample to get the arg max of value and then use it to extract names and value

(df.resample('2D')[['value']].idxmax()
   .assign(names=lambda x: df.loc[x.value]['names'].values,
           value=lambda x: df.loc[x.value]['value'].values)
)
Out[116]: 
            value names
2015-01-01      2   bob
2015-01-03      4   joe

answered Jun 27 '17 at 21:03

Allen Qin

19,507
8
51
67

1

Super solution. This also extends to data with the same dates. – EB88 Jun 27 '17 at 21:15

piRSquared · Answer 2 · 2017-06-27T22:09:33.493

3

Use apply and return the row with maximal value. It will get labeled via the resample

test.resample('2D').apply(lambda df: df.loc[df.value.idxmax()])

            value names
2015-01-01      2   bob
2015-01-03      4   joe

edited Jun 27 '17 at 22:09

answered Jun 27 '17 at 21:09

piRSquared

285,575
57
475
624

As I said to ayhan :-), this doesn't give the index that the OP is expecting. There might be a slick way to do it in one line, but I think you could just name the idxmax() result something and then set_index(ii.index) to patch it. – DSM Jun 27 '17 at 21:11
2

@DSM I did the same thing inside an `apply`. This way the indices are handled by the `resample` but I get the rows I want. Thanks for letting me know, I was in a meeting and couldn't respond right away (-: – piRSquared Jun 27 '17 at 22:10
This gets `AttributeError: 'Series' object has no attribute 'value'` on pandas v1.1.2. – user2561747 Sep 25 '20 at 16:57
`value` in this context was a column designated by the OP. It could have been written as `test.resample('2D').apply(lambda df: df.loc[df['value']idxmax()])` to make it clearer. – piRSquared Sep 25 '20 at 17:00

score 0 · Answer 3 · answered May 17 '22 at 20:01

The idxmax works well unless there are missing values in the dates. For example, if you resample every day, and one day has no values, instead of returning Nan, idxmax will raise an error.

The following is how to overcome the problems

def map_resample_columns(original_df, resample_df, key_col, cols):
    """
    The function will add the col back to resampled_df
    input: resample_df is resampled from original df based on key_col
    cols: list of columns from original_df to be added back to resample_df    
    """
    for col in cols:
        record_info = []
        for idx, row in resample_df.iterrows():
            val = row[key_col]
            if not np.isnan(val):
                record_info.append(original_df[original_df[key_col] == val][col].tolist()[0])
            else:
                record_info.append(np.nan)
        resample_df[col] = record_info
    return resample_df

Pandas: Resample dataframe column, get discrete feature that corresponds to max value

3 Answers3

Linked