0

I have a data-set of experimental data (y_observed, t whereby y_observed represents measured values and t represents the time in seconds since start of the measurement). I perform a Gaussian fitting to this data, but want to assess whether it makes sense to fit a Gaussian or whether the observed experimental data is non-Gaussian.

I've tried to implement the 2 sample Kolmogorov-Smirnov goodness of fit test in Python. However, I am wondering whether I am implementing and interpreting this test correctly. A small example can be found below, whereby y_observed does not look like a Gaussian (see attached figure) but according to the 2 sample K-S test y_observed does follow a Gaussian distribution. Can someone please explain to me what I'm doing wrong? Thanks in advance!

enter image description here

from scipy.stats import ks_2samp
import numpy as np

y_observed = [
    1.93291, 9.84869, 29.5713, 20.6346, -1.51537,
    0.396292, 50.8895, 68.855, 63.9291, 55.6863,
    37.6503, 46.87, 33.6637, 25.0395,18.3027,
    38.0947, 27.9305, 10.2012, -0.704519, 18.6656,
    20.2873, 8.78955, 4.39672
]


t = [
    519, 522, 525, 528, 531,
    534, 537, 540, 543, 546,
    549, 552, 555, 558 ,561,
    564, 567, 570, 573, 576,
    579,582,585
]


y_fit = [
    5.60417, 8.74951, 12.9907, 18.3426, 24.63,
    31.4518, 38.1947, 44.1102, 48.4454, 50.5991,
    50.2586, 47.4739, 42.6458, 36.4314, 29.5973,
    22.8668, 16.8011, 11.7394, 7.80065, 4.92939,
    2.96233, 1.69298, 0.920122
] 


count, bins_count = np.histogram(y_observed, bins=5)
pdf = count / sum(count)
cdf = np.cumsum(pdf)   
count_2, bins_count_2 = np.histogram(y_fit, bins=5)
pdf_2 = count_2 / sum(count_2)
cdf_2 = np.cumsum(pdf_2)
KS_stat, P_val = ks_2samp(cdf, cdf_2) 
n_1, n_2 = len(y_observed), len(y_fit)
critical_val = np.sqrt((n_1+n_2)/(n_1*n_2))*1.36
if P_val<0.05 and KS_stat>critical_val:
      print("Observed data does not follow a Gaussian distribution")
else:
      print("Observed data follows a Gaussian distribution")              
mikuszefski
  • 3,943
  • 1
  • 25
  • 38
Misterrik
  • 17
  • 2
  • Looking at the documentation it seems the implemented function gets data drawn from a certain pdf not the pdf itself. – mikuszefski Feb 03 '23 at 06:58

0 Answers0