I have a 3D (time, X, Y) numpy array containing 6 hourly time series for a few years. (say 5). I would like to create a sampled time series containing 1 instance of each calendar day randomly taken from the available records (5 possibilities per day), as follows.
- Jan 01: 2006
- Jan 02: 2011
- Jan 03: 2009
- ...
this means I need to take 4 values from 01/01/2006, 4 values from 02/01/2011, etc. I have a working version which works as follows:
- Reshape the input array to add a "year" dimension (Time, Year, X, Y)
- Create a 365 values array of randomly generated integers between 0 and 4
- Use np.repeat and array of integers to extract only the relevant values:
Example:
sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]
This seems to work, but I was wondering if this is the best/fastest approach to solve my problem? Speed is important as I am doing this in a loop, adn would benefit from testing as many cases as possible.
Am I doing this right?
Thanks
EDIT I forgot to mention that I filtered the input dataset to remove the 29th of feb for leap years.
Basically the aim of that operation is to find a 365 days sample that matches well the long term time series in terms on mean etc. If the sampled time series passes my quality test, I want to export it and start again.