0

I am trying to plot a chi squared probability density function trained on some experimental data at different conditions in python. My code is shown below.

    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.stats as ss  
    
    data= [] #read from CSV file.         

    chi_linespace = np.linspace(4, 1500, len(data))
    x,y,z = ss.chi2.fit(data)
    pdf_chi2 = ss.chi2.pdf(linespace, x,y,z)

    plt.hist(data, bins=100, density=True, alpha=0.3)
    plt.plot(linespace, pdf_chi2, label='Chi2')
    plt.legend()
    plt.show()

I have roughly 1500 observations per example, and when I run the code below most of the time I get a nice fit distribution. good fit distribution

I am finding sometimes when I run the same code on a different dataset the probability density function explodes at ~0 and does not appear to be fit from the dataset at all. bad fit distribution

Has someone experienced this before, and how did they go about resolving this?

iato
  • 356
  • 5
  • 16
  • 2
    My first guess is that the `ss.chi2.fit` function is not finding a valid solution; one possible cause is that it started from an initial guess which led it farther away from a valid solution instead of closer. Look at whatever `ss.chi2.fit` returns -- is the funny-looking plot associated with funny parameters? Is it repeatable -- if you run it twice on the same data, do you get the same result? You should be able to get in the neighborhood of the optimal parameters by approximating the chi2 distribution with a normal distribution -- are the parameters returned by `ss.chi2.fit` not too far off? – Robert Dodier Mar 22 '23 at 03:24
  • 1
    FYI (and not necessarily something that will fix the problem that you are having): it looks like your data is greater than 0, which is natural for the standard chi-square distribution. The scipy implementation of the distribution includes a *location* parameter that allows the support to be shifted. The `fit` method can return a negative location, which means the distribution allows negative values. GIven the meaning of the chi-square dist., it would be unusual to actually want this behavior. So normally, you should constrain `fit` to force the location to be 0 with the parameter `floc=0`. – Warren Weckesser Mar 22 '23 at 05:25
  • 1
    That is, I suspect you should call `fit` like this: `x,y,z = ss.chi2.fit(data, floc=0)` – Warren Weckesser Mar 22 '23 at 05:29
  • Sure enough floc=0 fixed it! Thank you :) – iato Mar 22 '23 at 17:43

0 Answers0