I have a "truncated dataset" and I would need to infer the distribution that most likely fits the data. Even though I have a "truncated dataset", instead of a "full dataset", I think that the best fitting distribution would be that one that could describe the "full dataset". This best-fitting distribution would be something like what is depicted by the blue line in this plot:
Do you have any comment, suggestion, or idea on how to get that blue line (in Matlab, R, Python, etc..)?
When I tried to reproduce - with Matlab and in particular with the fitdist function - the blue line in the above-mentioned figure, i.e. the best-fitting distribution as if I had the "full dataset", I was not successful. Here below you can find a comparison between the fitdist applied to the "full dataset" and the "truncated dataset", having both the same "origin", i.e. makedist('Normal','mu',3)
.
% (1) from a normal probability distribution, i.e. "makedist('Normal','mu',3)",
% create:
% (i) a "full dataset" and
% (ii) a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_full = random(pd,10000,1);
data_trunc = random(t,10000,1);
% (2) fit the normal distribution to
% (i) the "full dataset"
% (ii) the set of "truncated data"
pd_fit_full = fitdist(data_full,'normal');
pd_fit_trunc = fitdist(data_trunc,'normal');
% (3) plot
% (i.a) the "histogram of the full dataset" (from the "full dataset")
% (i.b) the density function corresponding to the distribution that fits the "full dataset"
% (ii.a) the "truncated histogram" (from the "truncated data")
% (ii.b) the density function corresponding to the distribution that fits the "truncated histogram"
xgrid = linspace(0,100,1000)';
hold on
histogram(data_full,100,'Normalization','pdf','facecolor','red')
line(xgrid,pdf(pd_fit_full,xgrid),'Linewidth',2,'color','red')
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,pdf(pd_fit_trunc,xgrid),'Linewidth',2,'color','blue')
hold off
xlim([0 10])