0

I have written code in MATLAB for a Chi-Square test. I wish to obtain P-values as 0.897 or 0.287 and so on, but my results are too small. Below is my code:

pd = fitdist(sample, 'weibull');
[h,p,st] = chi2gof(sample,'CDF',pd)

I've also tried using the AD test with similar result:

dist = makedist('Weibull', 'a',A, 'b',B);
[h,p,ad,cv] = adtest(sample, 'Distribution',dist)

Below is a histogram of the data with a fitted Weibull density function (Weibull parameters are A=4.0420 and B=2.0853)

histfit

Amro
  • 123,847
  • 25
  • 243
  • 454
fredd
  • 39
  • 1
  • 2
  • 8
  • hi @Luke Peterson.. no one has replied to my question.. do you have any suggestions? – fredd Jul 05 '14 at 12:46
  • you cannot force the p-value of the test, it either accepts or rejects your hypothesis. – Amro Jul 05 '14 at 18:51

1 Answers1

2

When the p-value is less than a predetermined significance level (default is 5% or 0.05), it means that the null hypotheses is rejected (which in your case means that the sample did not come from a Weibull distribution).

The chi2gof function first output variable h denotes the test result, where h=1 means that the test rejects the null hypothesis at the specified significance level.

Example:

sample = rand(1000,1);           % sample from Uniform distribution
pd = fitdist(sample, 'weibull');
[h,p,st] = chi2gof(sample, 'CDF',pd, 'Alpha',0.05)

The test clearly rejects H0, and concludes that the data did not came from a Weibull distribution:

h =
     1             % 1: H1 (alternate hypo), 0: H0 (null hypo)

p =
   2.8597e-27      % note that p << 0.05

st = 
    chi2stat: 141.1922
          df: 7
       edges: [0.0041 0.1035 0.2029 0.3023 0.4017 0.5011 0.6005 0.6999 0.7993 0.8987 0.9981]
           O: [95 92 92 97 107 110 102 95 116 94]
           E: [53.4103 105.6778 130.7911 136.7777 129.1428 113.1017 93.1844 72.8444 54.3360 110.7338]

Next let's try that again with a conforming sample:

>> sample = wblrnd(0.5, 2, [1000,1]);   % sample from a Weibull distribution

>> pd = fitdist(sample, 'weibull')
pd = 
  WeibullDistribution

  Weibull distribution
    A = 0.496413   [0.481027, 0.512292]
    B =  2.07314   [1.97524, 2.17589]

>> [h,p] = chi2gof(sample, 'CDF',pd, 'Alpha',0.05)
h =
     0
p =
    0.7340

the test now clearly passes with a high p-value.


EDIT:

Looking at the histogram you've shown, it does look like the data follows a Weibull distribution, although there might be cases of outliers (look at the right side of the histogram), which might explain why you are getting bad p-values. Consider preprocessing your data to handle extreme outliers..

Here is an example where I simulate outlier values:

% 5000 samples from a Weibull distribution
pd = makedist('Weibull', 'a',4.0420, 'b',2.0853);
sample = random(pd, [5000 1]);
%sample = wblrnd(4.0420, 2.0853, [5000 1]);

% add 20 outlier instances
sample(1:20) = [rand(10,1)+15; rand(10,1)+25];

% hypothesis tests using original distribution
[h,p,st] = chi2gof(sample, 'CDF',pd, 'Alpha',0.05)
[h,p,ad,cv] = adtest(sample, 'Distribution',pd)

% hypothesis tests using empirical distribution
[h,p,st] = chi2gof(sample, 'CDF',fitdist(sample,'Weibull'))
[h,p,ad,cv] = adtest(sample, 'Distribution', 'Weibull')

% show histogram
histfit(sample, 20, 'Weibull')

histfit

% chi-squared test
h =
     1
p =
    0.0382
st = 
    chi2stat: 8.4162
          df: 3
       edges: [0.1010 2.6835 5.2659 7.8483 25.9252]
           O: [1741 2376 764 119]
           E: [1.7332e+03 2.3857e+03 788.6020 92.5274]


% AD test
h =
     1
p =
   1.2000e-07
ad =
   Inf
cv =
    2.4924

The outliers are causing the distribution tests to fail (null hypothesis rejected). Still I couldn't reproduce getting a NaN p-value (you might wanna check this related question on Stats.SE about getting NaN p-values)..

Community
  • 1
  • 1
Amro
  • 123,847
  • 25
  • 243
  • 454
  • hi @Amro, when i display the PDF for weibull, it fit quite well my data....this is why i am finding fishy..i tested with AD and also same results.!sample=N; dist = makedist('Weibull','a',pd3.ParameterValues(1),'b',pd3.ParameterValues(2)) [h,p,ad,cv]= adtest(sample,'Distribution',dist) – fredd Jul 05 '14 at 19:07
  • what does h=0 and p=NaN means then? – fredd Jul 05 '14 at 19:11
  • can you show the output of say `hist(sample,20)`? Do you know beforehand the parameters of the Weibull distribution shape/scale, maybe the "fit distribution" part is not converging on good values? – Amro Jul 05 '14 at 19:29
  • all the [hypotheses testing](http://www.mathworks.com/help/stats/available-hypothesis-tests.html) functions in the Statistics toolbox follow the same [convention](http://www.mathworks.com/help/stats/hypothesis-test-terminology.html), `H=0` implies that null hypothesis is accepted, and `H=1` means the null hypothesis is rejected (at the significance level specified). As to why you get NaN p-value, perhaps the calculation degenerates because intermediate too big/small extreme values are generated somewhere in the process, just a guess.. – Amro Jul 05 '14 at 19:34
  • hi.. hist(sample,20) shows a histogram with bin size 20.... let me show u my data fit with weibull.. yeah i mean, i can generate the weibull parameters. – fredd Jul 05 '14 at 19:43
  • yes, please edit your question above and add a plot showing the data distribution. Also if you have knowledge of the distribution parameters, please mention them as well. By the way, you might wanna play with the `dfittool` tool to get a visual confirmation.. – Amro Jul 05 '14 at 19:48
  • i did already use dfittool, but it does not help in the tests.. AD or KS or CS. – fredd Jul 05 '14 at 19:54
  • @fredd: see my edit. I think you have outliers in your data that might be skewing the outcome of the tests.. – Amro Jul 05 '14 at 21:07
  • yeah you are right, for all 3 tests when i preprocess my data i no longer ontain NaN for p..though my p values do remain small though.. – fredd Jul 06 '14 at 07:46