4

I am working on replicating the scoring rule found in a paper Forecasting the intermittent demand for slow-moving inventories: A modelling approach

The paper describes the scoring rule as follows:


enter image description here


This is my attempt

y <- rpois(n = 100, lambda = 10) # forecasted distribution
x <- 10 # actual value

drps_score <- function(x = value, y = q){
  # x = actual value (single observation); y = quantile forecasted value (vector)
  Fy = ecdf(y) # cdf function
  indicator <- ifelse(y - x > 0, 1, 0) # Heaviside
  score <- sum((indicator - Fy(y))^2)
  return(score)
}

> drps_score(x = x, y = y)
[1] 53.028

This seems to work well until I provide a vector of 0s as follows:

y <- rep(x = 0, 100)
> drps_score(x = x, y = y)
[1] 0

I know that one of their methods used in this paper was a 0s forecast and their results did not show 0 for DRPS. This makes me think that the calculation is off.

Alex
  • 2,603
  • 4
  • 40
  • 73
  • 1
    What exactly are you trying to compute when your `y` is 100 zeroes? What kind of CDF you expect to be computed? It is not a proper distribution – Severin Pappadeux Oct 10 '20 at 18:14
  • So this could be hitting my my own misunderstanding. I would assume that any distribution could be used even outside of the exponential family. In my mind even a forecast of 0s for all percentiles, 0 to 99, is a fine distribution. If that is what a user wants for forecast then that seems like a reasonable input. – Alex Oct 10 '20 at 23:32
  • Well, in case you precisely know your value in the distribution PDF(x|x0) = δ(x-x0) and CDF=H(x-x0), where H() is heaviside step function. I would question value of forecast in such case. Basically, you predict value on the left of CDF H(x-x0), your heaviside is 0, CDF heaviside is 0, score is 0. – Severin Pappadeux Oct 10 '20 at 23:44
  • Okay, but does my implementation match the description? – Alex Oct 11 '20 at 11:09
  • 1
    As far as I can tell, your `indicator` is not the same one as presented by that paper. It should be `indicator <- ifelse(y - x > 0, 1, 0)`. – ekoam Oct 12 '20 at 17:54
  • @ekoam Thanks, I missed that. I will edit that. – Alex Oct 12 '20 at 22:26

1 Answers1

3

I think there are a few issues at play here.

First off, I don't think you are computing the correct sum inside the scoring function. The score asks you to sum across all possible values of y (i.e. across all positive integers) not across all forecasted samples of y.

Second, I don't think the above definition gives the desired result, with \hat F (y) defined to be 0 when y=x then you don't get a zero score for a forecast with a point mass at the true value. (Yes, I'm saying that source is "wrong", or at least has a definition that doesn't give the desired result.) Here is a re-formulated function that I think fixes both issues:

x <- 10 # actual value

drps_score <- function(x = value, y = q, nsum=100){
    # x = actual value (single observation); y = quantile forecasted value (vector)
    Fy = ecdf(y) # cdf function
    ysum <- 0:nsum
    indicator <- ifelse(ysum - x >= 0, 1, 0) # Heaviside
    score <- sum((indicator - Fy(ysum))^2)
    return(score)
}



> drps_score(x = x, y = rpois(n = 1000, lambda = 8))
[1] 1.248676
> drps_score(x = x, y = rpois(n = 1000, lambda = 9))
[1] 0.878183
> drps_score(x = x, y = rpois(n = 1000, lambda = 10))
[1] 0.692667
> drps_score(x = x, y = rep(10, 100))
[1] 0
> drps_score(x = x, y = rpois(n = 1000, lambda = 11))
[1] 0.883333

The above shows that the distribution that is centered on the true value (lambda=10) has the lowest score for distributions that aren't a point mass.

Nicholas G Reich
  • 1,028
  • 10
  • 21