3

I have two types of data lists, historical data and simulated data, that I want to compare with each other to see if they have the same distribution. My code is as follow:

import scipy.stats as stats

data_hist = [164, 157, 145, 113, 127, 192, 214, 193, 107, 95, 60, 55, 30, 19, 22, 22, 19, 20]
date_sim1 = [160, 174, 142, 121, 122, 192, 198, 179, 119, 107, 63, 50, 26, 17, 16, 22, 23, 23] 
date_sim2 = [181, 130, 152, 114, 122, 198, 183, 192, 105, 100, 85, 42, 37, 26, 25, 30, 17, 15] 
print(stats.chisquare(date_sim1, f_exp=data_hist))
print(stats.chisquare(date_sim2, f_exp=data_hist))

The code gives the following output:

Power_divergenceResult(statistic=12.11387994054504, pvalue=0.79319278886052769)
Power_divergenceResult(statistic=34.413397609752003, pvalue=0.0074220617004927226)

I compared the same data lists with each other using the F-test in excel and got the P-values as 0.939 and 0.849 respectively.

Now my question is am I using the correct chi-square function to calculate the P-value and how do I interpret it to know if I should reject the null hypothesis or not. Why is there a big difference in the P-value when using the different methods.

Pierre
  • 125
  • 1
  • 9

0 Answers0