1

It's very easy to generate normally distributed data with a desired mean and standard distribution:

IEnumerable<double> sample = MathNet.Numerics.Distributions.Normal.Samples(mean, sd).Take(n);

However with a sufficiently large value for n you will get values miles away from the mean. To put it into context I have a real world data set with mean = 15.93 and sd = 6.84. For this data set it is impossible to have a value over 30 or under 0, but I cannot see a way to add upper and lower bounds to the data that is generated.

I can remove data that falls outside of this range as below, but this results in the mean and SD for the generated sample differing significantly (in my opinion, probably not statistically) from the values I requested.

Normal.Samples(mean, sd).Where(x => x is >= 0 and <= 30).Take(n);

Is there any way to ensure that the values generated fall within a specified range without effecting the mean and SD of the generated data?

Ben
  • 5,525
  • 8
  • 42
  • 66
  • I'm sorry but I don't think that's how the normal distribution works. – Sweeper Feb 11 '22 at 16:54
  • Are you looking for a [Truncated Normal Distribution](https://en.wikipedia.org/wiki/Truncated_normal_distribution)? – Klaus Gütter Feb 12 '22 at 07:22
  • 1
    @Emond in the real world you can definitely have normally distributed data with a finite upper/lower bound. Exam results for example will have a min (0 correct) and a max (all correct). – Ben Feb 14 '22 at 09:35
  • @KlausGutter yes I think I am after a Truncated Normal Distribution, thank you for teaching me a new term! Any idea if you can produce such a distribution with Math.NET? – Ben Feb 14 '22 at 09:36
  • 1
    I know this is an old thread, but, if you are interested, I have some similar code I could simplify into an example to post. It's not quite a truncated normal distribution though. It's what I call a **discrete normal distribution**. Not only is the range set to specified points, but the whole distribution has a specified number of discrete points on the x axis, starting and ending with those range limits, rather than being continuously variable. It's great for music, which is my application. – SimonOR Aug 07 '22 at 21:51
  • Regarding my previous comment, I should clarify that my **discrete normal distribution** is based on Math.Net's **Normal** distribution. So it very likely could be modified to be a **(continuous) truncated normal distribution** if required. – SimonOR Aug 07 '22 at 22:15
  • @SimonOR that sounds promising! Yes I'd appreciate it if you don't mind sharing. If you have any ideas on how to modify it to a continuous truncated distribution as suggested that would be even more fantastic! – Ben Aug 09 '22 at 13:55

1 Answers1

1

The following proposed solution relies on a specific formula for calculating the standard deviation relative to the bounds: the standard deviation has to be a third of the difference between the mean and the required minimum or maximum.

This first code block is the TruncatedNormalDistribution class, which encapsulates MathNet's Normal class. The main technique for making a truncated normal distribution is in the constructor. Note the resulting workaround that is required in the Sample method:

using MathNet.Numerics.Distributions;

public class TruncatedNormalDistribution {
    public TruncatedNormalDistribution(double xMin, double xMax) {
      XMin = xMin;
      XMax = xMax;
      double mean = XMin + (XMax - XMin) / 2; // Halfway between minimum and maximum.
      // If the standard deviation is a third of the difference between the mean and
      // the required minimum or maximum of a normal distribution, 99.7% of samples should
      // be in the required range.
      double standardDeviation = (mean - XMin) / 3;
      Distribution = new Normal(mean, standardDeviation);
    }

    private Normal Distribution { get; }
    private double XMin { get; }
    private double XMax { get; }

    public double CumulativeDistribution(double x) {
        return Distribution.CumulativeDistribution(x);
    }

    public double Density(double x) {
        return Distribution.Density(x);
    }

    public double Sample() {
        // Constrain results lower than XMin or higher than XMax
        // to those bounds.
        return Math.Clamp(Distribution.Sample(), XMin, XMax);
    }
}

And here is a usage example. For a visual representation of the results, open each of the two output CSV files in a spreadsheet, such as Excel, and map its data to a line chart:

// Put the path of the folder where the CSVs will be saved here
const string chartFolderPath =
   @"C:\Insert\chart\folder\path\here";
const double xMin = 0;
const double xMax = 100;
var distribution = new TruncatedNormalDistribution(xMin, xMax);
// Densities
var dictionary = new Dictionary<double, double>();
for (double x = xMin; x <= xMax; x += 1) {
    dictionary.Add(x, distribution.Density(x));
}
string csvPath = Path.Combine(
    chartFolderPath, 
    $"Truncated Normal Densities, Range {xMin} to {xMax}.csv");
using var writer = new StreamWriter(csvPath);
foreach ((double key, double value) in dictionary) {
    writer.WriteLine($"{key},{value}");
}
// Cumulative Distributions
dictionary.Clear();
for (double x = xMin; x <= xMax; x += 1) {
    dictionary.Add(x, distribution.CumulativeDistribution(x));
}
csvPath = Path.Combine(
    chartFolderPath, 
    $"Truncated Normal Cumulative Distributions, Range {xMin} to {xMax}.csv");
using var writer2 = new StreamWriter(csvPath);
foreach ((double key, double value) in dictionary) {
    writer2.WriteLine($"{key},{value}");
}
SimonOR
  • 151
  • 6