I have been looking for a way to calculate the minimum number of samples required Ne(min) to train a classification model when the dataset is not normally distributed. A research paper suggests the following :
if the data are not normally distributed, an exponential relationship between d and N will be assumed and the number of samples that are required may be as plentiful as:
Ne(min) = Dsteps^d
where Dsteps is the discrete number of steps per feature.
d: dimension of the dataset.
....
It is useful to think of a histogram approach to understand this relationship. If we want to construct a histogram from data with at least one sample in each bin and with Dsteps discrete steps per feature, we will require at least Dsteps^d samples.
The number of samples required to model the data accurately is in this case an exponential function of d.
I will be very grateful if someone can help me to get/calculate this measure: the discrete number of steps per feature.
An explanation with R or Matlab code would be very helpful. Thank you :D
Edit:
Paper reference: Christiaan Maarten Van Der Walt: Data Measure that Characterises Classification Problems, 2008.