I am trying to estimate goodness of fit for normal distribution of error using chi-squares based criterion.
Specifically, I have a sample and it's estimate. From there I calculate the errors in approximation. I now use these errors as new, assumed normally distributed) observations, O, where the expected theoretical observation ,E, now is either the mean of these errors or 0 (you want the estimate to be perfect).
Using https://en.wikipedia.org/wiki/Goodness_of_fit chi-squared statistic should be equal to 1 for exact fit, which a-priori I do not expect.
I want approximate fit, and what I get is chi-squared statistic equal to ~1.3 - 1.5. On small samples these become sometimes 2-3.
Is this considered a bearable fit?
I implemented this in python, so the code is
def chi_squared(error,mean,var,N,n):
return ((error)**2/var).sum(0)/(N - n - 1)
or
def chi_squared(error,mean,var,N,n):
return ((error - mean)**2/var).sum(0)/(N - n - 1)
where N is the number of observations (len(error)) and n = 2 (number of parameters that I am trying to fit which are mean and var).
It works fairly well (I think) with as little as 6-8 observations which is strange as you need sufficient statistics to approximate gaussian (at least 10 samples, etc...) - I would expect higher values of chi-statistics...
Sample of the data:
[-0.626637 -0.466102 0.235232 -1.803282 -0.376370 -0.891675 -0.347168 0.000000]
From here I compute the mean and var and apply the above procedure (my true data is a pd.DF where each column contains a series as above, hence .sum(0). Can be used with sum() when using other data types)
Following a comment from @tom: the data I am using is numerical rather than categorical, hence using scipy.stats.chisquare is impossible. It seems that I need to calculate the chi-statistics and p-values myself, unless there is a way to do it from python directly?
Thank you in advance.