0

I am trying to estimate goodness of fit for normal distribution of error using chi-squares based criterion.

Specifically, I have a sample and it's estimate. From there I calculate the errors in approximation. I now use these errors as new, assumed normally distributed) observations, O, where the expected theoretical observation ,E, now is either the mean of these errors or 0 (you want the estimate to be perfect).

Using https://en.wikipedia.org/wiki/Goodness_of_fit chi-squared statistic should be equal to 1 for exact fit, which a-priori I do not expect.

I want approximate fit, and what I get is chi-squared statistic equal to ~1.3 - 1.5. On small samples these become sometimes 2-3.

Is this considered a bearable fit?

I implemented this in python, so the code is

def chi_squared(error,mean,var,N,n):
    return ((error)**2/var).sum(0)/(N - n - 1)

or

def chi_squared(error,mean,var,N,n):
    return ((error - mean)**2/var).sum(0)/(N - n - 1)

where N is the number of observations (len(error)) and n = 2 (number of parameters that I am trying to fit which are mean and var).

It works fairly well (I think) with as little as 6-8 observations which is strange as you need sufficient statistics to approximate gaussian (at least 10 samples, etc...) - I would expect higher values of chi-statistics...

Sample of the data:

[-0.626637 -0.466102 0.235232 -1.803282 -0.376370 -0.891675 -0.347168 0.000000]

From here I compute the mean and var and apply the above procedure (my true data is a pd.DF where each column contains a series as above, hence .sum(0). Can be used with sum() when using other data types)

Following a comment from @tom: the data I am using is numerical rather than categorical, hence using scipy.stats.chisquare is impossible. It seems that I need to calculate the chi-statistics and p-values myself, unless there is a way to do it from python directly?

Thank you in advance.

user3861925
  • 713
  • 2
  • 10
  • 24
  • 3
    why not use [scipy.stats.chisquare](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html)? – tmdavison Aug 20 '15 at 09:39
  • 1) Thank you for pointing out that scipy.stats.chisquare exists; 2) unfortunately (for me) scipy.stats.chisquare works on categorical data and uses frequencies as their input. I am working with numerical continuous data, so there are no frequencies to be used but actual values... I did not find any built in functions for continuous data. Pointer will be appreciated. – user3861925 Aug 20 '15 at 10:39
  • It might be useful if you post an example of your data – tmdavison Aug 20 '15 at 10:45
  • Sample of the data: [-0.626637 -0.466102 0.235232 -1.803282 -0.376370 -0.891675 -0.347168 0.000000]. From here I compute the mean and var and apply the above procedure (my true data is a pd.DF where each column contains a series as above, hence .sum(0). Can be used with sum() when using other data types) – user3861925 Aug 20 '15 at 11:02
  • @user3861925 Please do not use comments to add things to your question. Edit your question instead. – Roland Smith Aug 20 '15 at 12:31

0 Answers0