0

I have a weather dataframe which is formatted in half hour intervals (simplified below):

df = pd.DataFrame({ 'date': ['2019-01-01 09:30:00', '2019-01-01 10:00', '2019-01-02 04:30:00','2019-01-02 05:00:00','2019-01-04 02:00:00'],
                   'temp': [15.2,18.0,4.5,4.5,6.5],
                  'windSpeedMax': [20,90,35,45,15],
                   'windSpeedMax_Dir':['NE','SE','SW','W','S']})

My goal is to get the maximum wind AND the direction that wind is coming from each day. Using pandas:

df['date'] = pd.to_datetime(df['date'])
df = df[~df['windSpeedHigh'].isin(['--'])]
df['windSpeedHigh'] = pd.to_numeric(df['windSpeedHigh'])
df_daily = df.resample('D', on='date').agg({'windSpeedHigh': max})

My result is:

            windSpeedHigh
date                     
2019-01-01           90.0
2019-01-02           45.0
2019-01-03            NaN
2019-01-04           15.0

Which only finds the max wind speed each day

My intended result also needs to include the direction the max wind occurred:

            windSpeedHigh    windSpeedHigh
date                     
2019-01-01           90.0    SE
2019-01-02           45.0    W
2019-01-03            NaN    NaN
2019-01-04           15.0    S

1 Answers1

3

You can do groupby().idxmax():

idx = df.groupby(df['date'].dt.normalize())['windSpeedHigh'].idxmax()
df.loc[idx]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74