So I have been trying to optimize some code that calculates a statistical error metric from some array data. The metric is called the Continuous Rank Probability Score (CRPS).
I have been using Numba to try to speed up the double for loop required in this calculation, but I have been having an issue with the numpy.vstack
function. From what I understand from the docs here, the vstack()
function should be supported, but when I run the following code I get an error.
def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):
"""Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in
Hersbach et al. (2000)
Parameters
----------
obs: 1D ndarry
Array of observations for each start date
fcst_ens: 2D ndarray
Array of ensemble forecast of dimension n x M, where n = number of start dates and
M = number of ensemble members.
remove_neg: bool
If True, when a negative value is found at the i-th position in the observed OR ensemble
array, the i-th value of the observed AND ensemble array are removed before the
computation.
remove_zero: bool
If true, when a zero value is found at the i-th position in the observed OR ensemble
array, the i-th value of the observed AND ensemble array are removed before the
computation.
Returns
-------
dict
Dictionary contains a number of *experimental* outputs including:
- ["crps"] 1D ndarray of crps values per n start dates.
- ["crpsMean1"] arithmetic mean of crps values.
- ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).
Notes
-----
**NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the
corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.
References
----------
- Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score
for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.
"""
# Treating the Data
obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)
# Set parameters
n = fcst_ens.shape[0] # number of forecast start dates
m = fcst_ens.shape[1] # number of ensemble members
# Create vector of pi's
p = np.linspace(0, m, m + 1)
pi = p / m
crps_numba = np.zeros(n)
@njit
def calculate_crps():
# Loop fcst start times
for i in prange(n):
# Initialise vectors for storing output
a = np.zeros(m - 1)
b = np.zeros(m - 1)
# Verifying analysis (or obs)
xa = obs[i]
# Ensemble fcst CDF
x = np.sort(fcst_ens[i, :])
# Deal with 0 < i < m [So, will loop 50 times for m = 51]
for j in prange(m - 1):
# Rule 1
if xa > x[j + 1]:
a[j] = x[j + 1] - x[j]
b[j] = 0
# Rule 2
if x[j] < xa < x[j + 1]:
a[j] = xa - x[j]
b[j] = x[j + 1] - xa
# Rule 3
if xa < x[j]:
a[j] = 0
b[j] = x[j + 1] - x[j]
# Deal with outliers for i = 0, and i = m,
# else a & b are 0 for non-outliers
if xa < x[0]:
a1 = 0
b1 = x[0] - xa
else:
a1 = 0
b1 = 0
# Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)
if xa > x[m - 1]:
am = xa - x[m - 1]
bm = 0
else:
am = 0
bm = 0
# Combine full a & b vectors including outlier
a = np.concatenate((np.array([0]), a, np.array([am])))
# a = np.insert(a, 0, a1)
# a = np.append(a, am)
a = np.concatenate((np.array([0]), a, np.array([bm])))
# b = np.insert(b, 0, b1)
# b = np.append(b, bm)
# Populate a_mat and b_mat
if i == 0:
a_mat = a
b_mat = b
else:
a_mat = np.vstack((a_mat, a))
b_mat = np.vstack((b_mat, b))
# Calc crps for individual start times
crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()
return crps_numba, a_mat, b_mat
crps, a_mat, b_mat = calculate_crps()
print(crps)
# Calc mean crps as simple mean across crps[i]
crps_mean_method1 = np.mean(crps)
# Calc mean crps across all start times from eqn. 28 in Hersbach (2000)
abar = np.mean(a_mat, 0)
bbar = np.mean(b_mat, 0)
crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()
# Output array as a dictionary
output = {'crps': crps, 'crpsMean1': crps_mean_method1,
'crpsMean2': crps_mean_method2}
return output
The error I get is this:
Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path
File "test.py", line 86:
def calculate_crps():
<source elided>
if i == 0:
a_mat = a
^
[1] During: typing of assignment at *path
File "test.py", line 89:
def calculate_crps():
<source elided>
else:
a_mat = np.vstack((a_mat, a))
^
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
I just wanted to know where I am going wrong. It seems as though the vstack
function should work but maybe I am missing something.