Average all rows/columns in 3D numpy array at each timestep (band)

Question

I have a 3D array of type numpy.ma.core.MaskedArray. The data are 10m wind direction values at every lat/lon point in a 4x4 grid. The dataset is hourly so I have 87672 individual matrices (ten years of data).

For each hour, I want the mean of the 4x4 matrix in order to get the average wind direction for the whole lat/lon gridbox. I'd like to then store these values as a column of a dataframe. I can do this easily with a for loop, but it's a little slow for my taste.

Here is what the data look like:

wdir10:
masked_array(
  data= [[[152.67026 , 146.70743 , 152.55719 , 164.92401 ],
         [130.54579 , 130.6751  , 146.74638 , 159.93202 ],
         [116.40863 , 119.380585, 133.9567  , 153.77013 ],
         [110.93645 , 118.25403 , 128.3094  , 146.62206 ]],

        [[134.27574 , 135.58499 , 149.5903  , 159.4063  ],
         [115.946495, 119.14671 , 134.47972 , 147.49466 ],
         [109.198265, 113.795906, 126.024475, 144.82605 ],
         [108.69715 , 117.25688 , 125.6559  , 141.5147  ]],

        [[119.89018 , 130.3573  , 150.05553 , 168.43152 ],
         [115.14506 , 120.63544 , 134.53693 , 150.49675 ],
         [117.6862  , 122.55777 , 132.94057 , 150.32137 ],
         [121.804016, 127.57132 , 136.711   , 152.43686 ]],

        ...,

I can do:

dtime = pd.date_range(start='2012-01-01 00:00:00', end='2021-12-31 23:00:00', freq='H')
wind_df = pd.DataFrame(dtime)
wind_df['wdir.10'] = np.nan
for i in range(0,len(dtime)):
    wind_df['wdir.10'][i] = np.mean(wdir10[i,:,:])

Which works just fine, it just takes a little longer than I'd like (about 20 seconds in my spyder environment). Since I'm going to be doing this for several other variables (wind direction at 50m and 100m, plus wind speed at 10m, 50m and 100m), I'd like it to be faster. Is there a way to vectorize the process? Or can I maybe use groupby?

Thanks in advance.

If there are masked elements, check if there's a `np.ma.mean` or `mean` method`. In either case check the docs for the axis parameter. — hpaulj, Apr 14 '22 at 14:16
Thanks @ProfessorPantsless, that did the trick! I had tried np.mean(wdir10, axis = (0)) before. I guess I didn't understand how the axis parameter worked. — Hellonskis, Apr 15 '22 at 09:14

score 0 · Accepted Answer · answered Apr 19 '22 at 00:48

I have not tested performance, but you should get a boost using the built in mean method and specifying the axis parameter as a tuple to reduce along multiple axes as desired:

np.mean(wdir10, axis = (1, 2))

Note that, according to the numpy documentation, this would also be equivalent:

wdir10.mean(axis = (1, 2))

Average all rows/columns in 3D numpy array at each timestep (band)

1 Answers1