I need to find the distribution of data, which is from a retail chain network( demand of product across all stores). I tried to fit distribution using EasyFit (which has 82 distribution to check the best distributions) but no distribution fits the data. What can be done? Is there any way to find if the data distribution is a sum or convolution of multiple distribution? I have removed the spikes or seasonality or promotional data from the dataset but still no distribution fits.
-
See the thread on stats.stackexchange.com for some useful discussion: http://stats.stackexchange.com/questions/112349/automatic-identification-of-distribution-of-data – Luca May 10 '15 at 10:51
-
Essentially there is no reason whatsoever that the real life data follows some convenient parametric distribution. Like the thread said, even if it did, it would be impossible to prove. best you can do is fit some distributions and choose one based on some distance metric between the distributions like the KL divergence, for example. – Luca May 10 '15 at 10:53
-
You can try fitting some mixture distributions once you isolate some trends and perhaps that would describe your underlying dataset better? Also, I am not sure removing datasets that do not fit your assumptions is a good idea! – Luca May 10 '15 at 10:54
-
@Luca How to fit mixture distributions (If you are talking about some of distributons). I have removed the seasonlaity(promotion) effect. – Manu9 May 10 '15 at 18:21
2 Answers
It depends on what you mean by " no distribution fits the data". You get the best fit by ranking the distributions (using Kolmogorov-Smirnov, Anderson Darling, chi-squared or other test statistics). You won't get a perfect fit as the distributions are theoretical, you work with the best fit, or don't work with it at all. Post some of the data, and the best fit test statistic, or elaborate on the question. Sometimes you just need to accept that the data is either poorly constructed, or that it just makes no sense in analysing it.
If the question is purely statistical in nature, you might be better off posting it on https://stats.stackexchange.com/.
-
By "no distribution fits the data" I mean that all the test like KS, AD and chi-square test statistics fails or reject the distributions. The tool i used is EasyFit which has some 82 distribution. – Manu9 May 10 '15 at 13:04
-
Then fit a polynomial around it manually if it's really necessary, but if you can't find a reasonable fit using 80 popular distributions, then in all likelihood curve fitting is not something you should be doing. – ajsp May 10 '15 at 15:47
-
I am looking for methods which will allow me to fit multiple distribution like the sum of two distribution. Let me know if anything else if possible. @ajsp – Manu9 May 10 '15 at 16:25
-
I don't have a clue what you mean, you should sleep on it, come back to the problem when you know what you need. – ajsp May 10 '15 at 17:16
Have you tried transforming the data? Simulate multiple transformations and take the best approximation to a distribution amenable for statistical inference.

- 285
- 1
- 16
-
Can you please elaborate on what you mean by transforming data? The data is daily level store demand data for each product. I need to find the distribution which will represent the demand. @alexread – Manu9 May 10 '15 at 16:24
-
So, you can try ladders of transformation such as log, square, inverse etc. Stata and R have packages for these. For instance, income data is mostly skewed and can be turned into a normal distribution via logarithmic transformation. – AlxRd May 11 '15 at 15:14
-
The data is not skewed but the shape is not decreasing smoothly. There are ups and downs at the tails(bcoz of which it is rejected by anderson darling test). You can see the data at the following link. https://docs.google.com/spreadsheets/d/1b1urdAxy3EsqpBKEyD9_d5qgZei4bna7nUR1raBBgoY/edit?usp=sharing@alexread. – Manu9 May 11 '15 at 16:17
-
Manu9 - My rough observation of the data is that what you have is a Poisson distribution. Tabulate the frequency of each occurrence - 0, 1, 2, 3, etc. demands over all, by store, by season etc., among others. There are multiple ways to approach this data, the most sensible one is to consider stores as individuals as a start. I was not able to actually check out the data since it is in protected view. This is also why it is good to provide an excerpt of the data along with your question from the start. – AlxRd May 12 '15 at 16:34
-
The data looks like Poisson but it fails the goodness of fit test. I need to find the distribution of this demand data to generate demand in my simulation model. I am trying to find a distribution that fits this data. I have changed the setting of Google Sheets, you can edit it. – Manu9 May 13 '15 at 09:36