4

I am trying to find the probability of an event of a random variable past a specific value, i.e. pr(x>a), where a is some constant, typically much higher than the average of x, and x is not of any standard Gaussian distribution. So I wanted to fit some other probability density function, and take the integral of the pdf of x from a to inf. As this is a problem of modelling the spikes, I considered this an Extreme Value analysis problem, and found that the Weibull distribution might be appropriate.

Regarding extreme value distributions, the Weibull distribution has a very "not-easy-to-implement" integral, and I therefore figured I could just get the pdf from Scipy, and do a Riemann-sum. I also thought that I could as well simply evaluate the kernel density, get the pdf, and do the same with the Riemann sum, to approximate the integral.

I found a Q here on Stack which provided a neat method for doing Riemann sums in Python, and I adapted that code to fit my problem. But when I evaluate the integral I get weird numbers, indicating that something is either wrong with the KDE, or the Riemann sum-function.

Two scenarios, the first with the Weibull, in accordance with the Scipy documentation:

x = theData
x_grid = np.linspace(0,np.max(x),len(x))

p = ss.weibull_min.fit(x[x!=0], floc=0)
pd = ss.weibull_min.pdf(x_grid,p[0], p[1], p[2])

which looks like this:

enter image description here

and then also tried the KDE method as follows

pd = ss.gaussian_kde(x).pdf(x_grid)

which I subsequently run through the following function:

def riemannSum(a, b, n):
    dx = (b - a) / n
    s = 0.0
    x = a
    for i in range(n): 
        s += pd[x]
        x += dx
    return s * dx          
print(riemannSum(950.0, 1612.0, 10000))
print(riemannSum(0.0, 1612.0, 100000))

In the case of the Weibull, it gives me

>> 0.272502150549
>> 18.2860384829

and in the case of the KDE, I get

>> 0.448450460469
>> 18.2796021034

This is obviously wrong. Taking the integral of the entire thing should give me 1, and 18.2+ is quite far off.

Am I wrong in my assumptions of what I can do with these density functions? Or have I made some mistake in the Riemann sum function

TomServo
  • 7,248
  • 5
  • 30
  • 47
Niklas Lindeke
  • 380
  • 1
  • 7
  • 24
  • 1
    Look carefully at this and your Riemann sum code: https://en.wikipedia.org/wiki/Riemann_sum – Sinan Ünür Jul 31 '17 at 16:31
  • 1
    Calculating integrals numerically is a great problem to work on, but if your goal is actually to calculate tail probabilities, the Scipy class for the Weibull distribution already has a `cdf` method. – Robert Dodier Jul 31 '17 at 16:50

3 Answers3

3

the Weibull distribution has a very "not-easy-to-implement" integral

Huh?!

Weibull distribution has very well defined CDF, so implementing integral is pretty much one-liner (ok, make it two for clarity)

def WeibullCDF(x, lmbd, k):
    q = pow(x/lmbd, k)
    return 1.0 - exp(-q)

And, of course, there is ss.weibull_min.cdf(x_grid,p[0], p[1], p[2]) if you want to pick from standard library

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Thank you! Can't believe how I could forget the cdf/pdf relationship. Got stuck on the fun challenge of approximating the integral instead. I'm lucky that my uni-grades can't be lowered once set, otherwise my old math-stat teacher would be quite furious. – Niklas Lindeke Aug 01 '17 at 07:52
1

I know there is an accepted answer that worked for you but I stumbled across this while looking to see how to do a Riemann sum of a probability density and others may too so I will give this a go.

Basically, I think you had (what is now) an older version of numpy that allowed floating point indexing and your pd variable pointed to an array of values drawn from the pdf corresponding to the values at xgrid. Nowadays you will get an error in numpy when trying to use a floating point index but since you didn't you were accessing the value of the pdf at the grid values corresponding to that index. What you needed to do was calculate the pdf with the new values you wanted to use in your Riemann sum.

I edited the code from the question to create a method that works for calculating the integral of the pdf.

def riemannSum(a, b, n):
     dx = (b-a)/n
     s = 0.0
     x = 0
     pd = weibull_min.pdf(np.linspace(a, b, n), p[0], p[1], p[2])
     for i in range(n):
         s += pd[x]
         x += 1
     return s*dx
brycek
  • 23
  • 7
0

Below Riemann implementation can also be used (it uses Java instead of Python) sorry.

import static java.lang.Math.exp;
import static java.lang.Math.pow;

import java.util.Optional;
import java.util.function.BiFunction;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.stream.IntStream;

public class WeibullPDF
{
    public interface Riemann extends BiFunction<Function<Double, Double>, Integer, 
    BinaryOperator<Double>>     { }

    public static void main(String args[])
    {
        int N=100000;
        Riemann s = (f, n) -> (a, b) -> 
        IntStream.range(0, n).
        .mapToDouble(i->f.apply(a+i*((b-a)/n))*((b-a)/n)).sum();
        double k=1.5;
        Optional<Double> weibull = 
        Optional.of(s.apply(x->k*pow(x,k-1)*exp(-pow(x,k)),N).apply(0.0,1612.0));
        weibull.ifPresent(System.out::println); //prints 0.9993617886716168
    }
}
  • 2
    Questioner wants `python` not `java`. Please change it. – Hossein Golshani Oct 12 '18 at 22:47
  • I think my answer is important even if it doesn't answer the question in Python, since it shows how to use a high order function to implement a Riemann Sum from scratch and how to pass the Weibull PDF (probability density function) as a first order function. It may help other developers which want to understand how to solve this problem without using external libraries and functional programming with Java. – Jose Luis Soto Posada Oct 13 '18 at 13:42
  • @JoseLuisSotoPosada Your implementation is not done in a way that can **easily** be read by non-Java coders. – jason m Sep 23 '20 at 21:18