Looking for advice on how to determine wether my model output data distribution is similar (and if so, then how similar) to the observed datasets distribution.
Basically I have a GBM model with mean reversion that provides seemingly good results, when I compare its distribution to observed data. You can see their PDFs side-by-side in the attached picture.
PDF of observed and model data
Both datasets are huge (~6 million datapoint), and I start to suspect that this is part of the problem...
I am looking for a way to verify that the datasets distributions are similar. I tried the two-sample Kolmogorov-Smirnov test, two-sample t-test, but for some reason both of them rejected the Null hypothesis (always, even with different Alphas). In some threads I've read that these tests are unreliable, when applied to huge datasets, but there wasn't a consensus about this.
I am using Matlab currently, but I am open to others if necessary.
Any help would be appreciated! I primarily looking for a hypothesis test for verification, but if you have a different idea don't hold it back!