0

Context

I have download u- and v- wind components from: ERA5 reanalysis ("reanalysis-era5-pressure-levels", variable observation) and ECMWF's Seasonal Forecast ("seasonal-original-pressure-levels", variable forecast) for the same year (2021), and I would like to use Continuous Ranked Probability Score (CRPS) from properscoring to evaluate the performance of the forecast.

For evaluation, the idea is to use xr.apply_ufunc over number (members) to calculate, for every: time, latitude and longitude; the CRPS, something like:

import properscoring as ps

crps = xr.apply_ufunc(
    ps.crps_ensemble,
    observation,
    forecast,
    input_core_dims=[[], ["number"]],
    output_core_dims=[[]],
    vectorize=True,
)

However, observation is indexed by ("time", "latitude", "longitude"), while forecast is indexed by ("time", "latitude", "longitude", "steps", "number"); making them not compatible (xr.align(..., join = "exact") fails because of both time and steps).

How can I transform observations's time dimension into ("time", "steps")?

What I have tried

Using backend_kwargs

For another project I used xr.open_dataset(..., backend_kwargs={"time_dims": ("valid_time",)}), which made valid_time match with time dimension and things worked out. However, for this current project, there is overlap between valid_time and time with steps (e.g. time=2021-01-01 + steps=36:00:00 > time=2021-01-02), which makes this solution undesirable, as values are lost.

forecast = xr.open_dataset(
    "data/raw/forecast.grib",
    engine="cfgrib",
    backend_kwargs=dict(time_dims=("valid_time",)),
)
# forecast is missing values corresponding to `steps > 24:00:00`

Concatenating into a new dimension

Another solution that I tried implementing is concatenating this rolling window as different steps: for each day, extract the next 36 hours as steps into a new dimension, steps. However, this did neither create a new dimension (maybe xr.concat is not the correct function?) nor was fast enough (took more than 6 minutes for two months).

xr.concat(
    [
        observation.sel(time = slice(date, date + pd.Timedelta("36H")))
        for date in observation.time.values
    ],
    dim = "steps"
)
Felipe Whitaker
  • 470
  • 3
  • 9
  • why don't you carry out the calculation for each `step` separately? – Matteo De Felice Jul 17 '23 at 10:12
  • I couldn't think how I could organize that either. How do you suggest implementing it, @MatteoDeFelice? I am trying to match the dimensions so `xarray` can do the calculations – Felipe Whitaker Jul 17 '23 at 15:52
  • I did something similar using the same data a couple of years ago, now I don't remember how I solved this issue but the code is here: https://github.com/matteodefelice/C3S_evaluator – Matteo De Felice Jul 17 '23 at 18:31
  • Great project! I notice that you used `backend_kwargs=dict(time_dims = ('verifying_time',)`, is there an explanation for it? When I use `valid_time` it only stores the first value that corresponds to that timing (see [this email](https://groups.google.com/g/xarray/c/NGrBVGvOFSo)), and `verifying_time` turned the 5 years daily 12-hourly forecasts into a single `(number, lat, lon)` dataset. Do you know how these `backend_kwargs` work? – Felipe Whitaker Jul 17 '23 at 20:56
  • as far as I remember, that was needed to deal with overlapping times, see here: https://stackoverflow.com/questions/76700505/how-to-treat-overlapping-time-from-forecast-steps – Matteo De Felice Aug 22 '23 at 21:22

0 Answers0