1

I have a dataset of hourly weather observations in this format:

df = pd.DataFrame({ 'date': ['2019-01-01 09:30:00', '2019-01-01 10:00', '2019-01-02 04:30:00','2019-01-02 05:00:00','2019-07-04 02:00:00'],
                  'windSpeedHigh': [155,90,35,45,15],
                   'windSpeedHigh_Dir':['NE','NNW','SW','W','S']})

My goal is to find the highest wind speed each day and the wind direction associated with that maximum daily wind speed.

Using resample, I have sucessfully found the maximum wind speed for each day, but not its associated direction:

df['date'] = pd.to_datetime(df['date'])
df['windSpeedHigh'] = pd.to_numeric(df['windSpeedHigh'])
df_daily = df.resample('D', on='date')[['windSpeedHigh_Dir','windSpeedHigh']].max()
df_daily

Results in:

windSpeedHigh_Dir   windSpeedHigh
date        
2019-01-01  NNW 155.0
2019-01-02  W   45.0
2019-01-03  NaN NaN
2019-01-04  NaN NaN
2019-01-05  NaN NaN
... ... ...
2019-06-30  NaN NaN
2019-07-01  NaN NaN
2019-07-02  NaN NaN
2019-07-03  NaN NaN
2019-07-04  S   15.0

This is incorrect as this resample is also grabbing the max() for 'windSpeedHigh_Dir'. For 2019-01-01 the direction for the associated windspeed should be 'NE' not 'NNW', because the wind direction df['windSpeedHigh_Dir'] == 'NE' when the maximum wind speed occurred.

So my question is, is it possible for me to resample this dataset from half-hourly to daily maximum wind speed while keeping the wind direction associated with that speed?

1 Answers1

0

Use DataFrameGroupBy.idxmax for indices by dates first:

df_daily = df.loc[df.groupby(df['date'].dt.date)['windSpeedHigh'].idxmax()]
print (df_daily)
                 date  windSpeedHigh windSpeedHigh_Dir
0 2019-01-01 09:30:00            155                NE
3 2019-01-02 05:00:00             45                 W
4 2019-07-04 02:00:00             15                 S

And then for add DatetimeIndex use DataFrame.set_index with Series.dt.normalize and DataFrame.asfreq:

df_daily = df_daily.set_index(df_daily['date'].dt.normalize().rename('day')).asfreq('d')
print (df_daily)
                          date  windSpeedHigh windSpeedHigh_Dir
day                                                            
2019-01-01 2019-01-01 09:30:00          155.0                NE
2019-01-02 2019-01-02 05:00:00           45.0                 W
2019-01-03                 NaT            NaN               NaN
2019-01-04                 NaT            NaN               NaN
2019-01-05                 NaT            NaN               NaN
                       ...            ...               ...
2019-06-30                 NaT            NaN               NaN
2019-07-01                 NaT            NaN               NaN
2019-07-02                 NaT            NaN               NaN
2019-07-03                 NaT            NaN               NaN
2019-07-04 2019-07-04 02:00:00           15.0                 S

[185 rows x 3 columns]

Your solution shoudl working with custom function, because idxmax failed for missing values with DataFrame.join:

f = lambda x: x.idxmax() if len(x) > 0 else np.nan
df_daily = df.resample('D', on='date')['windSpeedHigh'].agg(f).to_frame('idx').join(df, on='idx')

print (df_daily)
            idx                date  windSpeedHigh windSpeedHigh_Dir
date                                                                
2019-01-01  0.0 2019-01-01 09:30:00          155.0                NE
2019-01-02  3.0 2019-01-02 05:00:00           45.0                 W
2019-01-03  NaN                 NaT            NaN               NaN
2019-01-04  NaN                 NaT            NaN               NaN
2019-01-05  NaN                 NaT            NaN               NaN
        ...                 ...            ...               ...
2019-06-30  NaN                 NaT            NaN               NaN
2019-07-01  NaN                 NaT            NaN               NaN
2019-07-02  NaN                 NaT            NaN               NaN
2019-07-03  NaN                 NaT            NaN               NaN
2019-07-04  4.0 2019-07-04 02:00:00           15.0                 S

[185 rows x 4 columns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252