Python time-lat-lon array manipulation and grouping

Question

For a t-x-y array representing time-latitude-longitude and where the values of the t-x-y grid hold arbitrary measured variables, how can i 'group' x-y slices of the array for a give time condition?

For example, if a companion t-array is a 1d list of datetimes, how can i find the elementwise mean of the x-y grids that have months equal to 1. If t has only 10 elements where month = 1 then I want a (10, len(x), len(y)) array. From here I know I can do np.mean(out, axis=0) to get my desired mean values across the x-y grid, where out is the result of the array manipulation.

The shape of t-x-y is approximately (2000, 50, 50), that is a (50, 50) grid of values for 2000 different times. Assume that the number of unique conditions (whether I'm slicing by month or year) are << than the total number of elements in the t array.

What is the most pythonic way to achieve this? This operation will be repeated with many datasets so a computationally efficient solution is preferred. I'm relatively new to python (I can't even figure out how to create an example array for you to test with) so feel free to recommend other modules that may help. (I have looked at Pandas, but it seems like it mainly handles 1d time-series data...?)

Edit:

This is the best I can do as an example array:

>>> t = np.repeat([1,2,3,4,5,6,7,8,9,10,11,12],83)
>>> t.shape
(996,)
>>> a = np.random.randint(1,101,2490000).reshape(996, 50, 50)
>>> a.shape
(996, 50, 50)
>>> list(set(t))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

So a is the array of random data, t is (say) your array representing months of the year, in this case just plain integers. In this example there are 83 instances of each month. How can we separate out the 83 x-yslices of a that correspond to when t = 1 (to create a monthly mean dataset)?

What is the datatype stored in your companion `t`-array (python datetime, or numpy datetime64)? You've got the right idea in your answer below, in that you want to extract the month data (or (year,month) if your dataset span is longer than a year) from your datetime into a companion array, and then group by that. [Numpy isn't designed to do grouped aggregations well.](http://stackoverflow.com/questions/11989164/numpy-mean-structured-array) — mtadd, May 20 '14 at 16:36

score 0 · Accepted Answer · answered May 20 '14 at 15:51

One possible answer to the (my) question, using numpy.where

To find the slices of a, where t = 1:

>>> import numpy as np
>>> out = a[np.where(t == 1),:,:]

although this gives the slightly confusing (to me at least) output of:

>>> out.shape
(1, 83, 50, 50)

but if we follow through with my needing the mean

>>> out2 = np.mean(np.mean(out, axis = 0), axis = 0)

reduces the result to the expected:

>>> out2.shape
(50,50)

Can anyone improve on this or see any issues here?

`a[t==1].shape == (83, 50, 50)` – mtadd May 20 '14 at 16:01 — mtadd, May 20 '14 at 16:01

Python time-lat-lon array manipulation and grouping

1 Answers1