I would like to compute the inverse cumulative density function (inverse cdf) of a given pdf. The pdf is directly given as a histogram, ie., a vector of N equally spaced components.
My current approach is to do :
cdf = cumsum(pdf);
K = 3; %// some upsampling factor
maxVal = 1; %// just for my own usage - a scaling factor
M = length(cdf);
N = M*K; %// increase resolution for higher accuracy
y = zeros(N, 1);
cursor = 2;
for i=1:N
desiredF = (i-1)/(N-1)*maxVal;
while (cursor<M && cdf(cursor)<desiredF)
cursor = cursor+1;
end;
if (cdf(cursor)==cdf(cursor-1))
y(i) = cursor-1;
else
alpha = min(1, max(0,(desiredF - cdf(cursor-1))/(cdf(cursor)-cdf(cursor-1))));
y(i) = ((cursor-1)*(1-alpha) + alpha*cursor )/maxVal;
end;
end;
y = resample(y, 1, K, 0);
which means that I upsample with linear interpolation, inverse and downsample my histogram. This is rather an ugly code, is not very robust (if I change the upsampling factor, I can get really different results), and is uselessly slow... could anyone suggest a better approach ?
Note : the generalized inverse I am trying to compute (in the case the cdf is not invertible) is :
F^{-1}(t) = \inf{x \in R ; F(x)>t }
with F the cumulative density function
[EDIT : actually, K = 1 (ie., no upsampling) seems to give more accurate results...]
Thanks!