4

I have a IEnumerable<double> data sample. I want to compute the 90% confidence interval for the signal/data. I have MathNET library at my disposal, but I am confused as to how to correctly work with the library. Given my data, the idea is to return two additional data arrays that contain the original signal's confidence intervals

using MathNet.Numerics.Statistics;
using MathNet.Numerics.Distributions;

public static List<double[]> ConfidenceIntervals(IEnumerable<double> sample, double interval)
{
    Contract.Requires(interval > 0 && interval < 1.0);
    int sampleSize = sample.Count();
    double alpha = 1.0 - interval;
    double mean = sample.Mean();
    double sd = sample.StandardDeviation();

    double t, mu;
    double[] upper = new double[sampleSize];
    double[] lower = new double[sampleSize];
    StudentT studentT = new StudentT(mean, alpha, sampleSize - 1);
    int index = 0;
    foreach (double d in sample)
    {
        t = studentT.CumulativeDistribution(d);
        double tmp = t * (sd / Math.Sqrt(sampleSize));
        mu = mean - tmp;
        upper[index] = d + mu;
        lower[index] = d - mu;
    }
    return new List<double[]>() { upper, lower };
}

This really is not complex in terms of mathematics, I am just confused as to how to correctly use the functions/methods available to me in the MathNET library.

TylerH
  • 20,799
  • 66
  • 75
  • 101
MoonKnight
  • 23,214
  • 40
  • 145
  • 277

2 Answers2

8

I'm not entirely sure I understand how the confidence interval of the signal is supposed to be applied to each sample of the signal, but we can compute the confidence interval of the sample set as follows:

public static Tuple<double, double> A(double[] samples, double interval)
{
    double theta = (interval + 1.0)/2;
    double mean = samples.Mean();
    double sd = samples.StandardDeviation();
    double T = StudentT.InvCDF(0,1,samples.Length-1,theta);
    double t = T * (sd / Math.Sqrt(samples.Length));
    return Tuple.Create(mean-t, mean+t);
}

Except that the line where we compute T does not compile because unfortunately there is no StudentT.InvCDF in current Math.NET Numerics yet. But we can still evaluate it numerically as a workaround in the meantime:

var student = new StudentT(0,1,samples.Length-1);
double T = FindRoots.OfFunction(x => student.CumulativeDistribution(x)-theta,-800,800);

For example, with 16 samples and alpha 0.05 we get 2.131 as expected. If there are more than ~60-100 samples, this can also be approximated with the normal distribution:

double T = Nomal.InvCDF(0,1,theta);

So all in all:

public static Tuple<double, double> B(double[] samples, double interval)
{
    double theta = (interval + 1.0)/2;
    double T = FindRoots.OfFunction(x => StudentT.CDF(0,1,samples.Length-1,x)-theta,-800,800);

    double mean = samples.Mean();
    double sd = samples.StandardDeviation();
    double t = T * (sd / Math.Sqrt(samples.Length));
    return Tuple.Create(mean-t, mean+t);
}

This is not the full answer yet as I understand you wanted to somehow apply the confidence interval to each sample, but hopefully it helps on the way to get there.

PS: Using Math.NET Numerics v3.0.0-alpha7

Christoph Rüegg
  • 4,626
  • 1
  • 20
  • 34
  • Crisroph, thanks very much for your reply. I am finishing for the day now so will implement and feed back tomorrow. I might ask another question around MathNET tomorrow, it is regarding the requirement to cast when performing matrix operations `Matrix M = (DenseMatrix)(SomeMatrixA * SomeMatrixB)`. This is a separate question, so I will ask another tomorrow. Also, thanks very much for maintaining such a great library! All the best... – MoonKnight Dec 30 '13 at 22:05
  • Sorry in the delay in getting back to you here. I have just attempted to use the code you have provided above, but there is no `StudentT.CDF` method available to me; There is with the Normal distribution however. Do I need to update my library? – MoonKnight Jan 03 '14 at 17:18
  • Note, I say that there is with the `Normal` distribution. There is a `public static` method called `CDF`, but I can't seem to be able to access this using the latest NuGet Package; if I download the source and compile this myself then it is fine? – MoonKnight Jan 03 '14 at 17:48
  • `StudentT.CDF` was added in v3.0.0-alpha7 (assembly version 3.0.0.40). What version are you using? – Christoph Rüegg Jan 04 '14 at 11:47
  • I am not sure. I was using the Package Manager with command `PM> Install-Package MathNet.Numerics` and this got me V2.6.1.30. I have got the source code, and this seems to output V1.0.0.0 - i.e. version not set. Where can I get the latest release DLL and the latest source? Thanks for your time... – MoonKnight Jan 04 '14 at 14:03
  • 1
    The sources do not set a version number on local builds by design. If you've got the sources from GitHub they are likely up to date. For NuGet you need to explicitly allow pre-releases though: `PM> Install-Package MathNet.Numerics -Pre`. – Christoph Rüegg Jan 04 '14 at 14:16
  • v3.0.0-alpha7 is also on [CodePlex](http://mathnetnumerics.codeplex.com/releases), see top of the list in the 'other downloads' box – Christoph Rüegg Jan 23 '14 at 00:12
  • Thanks very much. I take it the new `Matrix.Build.Dense(..)` etc. is the method that will be adopted going forward? – MoonKnight Jan 23 '14 at 13:20
  • It seems so, yes. It is still a bit long for my taste, but there is actually no redundant information in there we could drop. Of course you can do `static readonly MatrixBuilder M = Matrix.Build;` and then just write `M.Dense(..)`. – Christoph Rüegg Jan 23 '14 at 17:47
  • Nice. I wanted to ask, have you built this from scratch on your own? It is a very impressive library and the more I use it the more I realize it. I hope that it remains open source! – MoonKnight Jan 24 '14 at 09:04
1

I noticed that you didn't increase the index value in foreach loop. This will make the value at index 0 is replaced by the next calculation (When you try to set upper[index] and lower[index] values).

So I guess this is a reason why you got the incorrect results.

If so, your code should be

using MathNet.Numerics.Statistics;
using MathNet.Numerics.Distributions;

public static List<double[]> ConfidenceIntervals(IEnumerable<double> sample, double interval)
{
    Contract.Requires(interval > 0 && interval < 1.0);
    int sampleSize = sample.Count();
    double alpha = 1.0 - interval;
    double mean = sample.Mean();
    double sd = sample.StandardDeviation();

    double t, mu;
    double[] upper = new double[sampleSize];
    double[] lower = new double[sampleSize];
    StudentT studentT = new StudentT(mean, alpha, sampleSize - 1);
    int index = 0;
    foreach (double d in sample)
    {
        t = studentT.CumulativeDistribution(d);
        double tmp = t * (sd / Math.Sqrt(sampleSize));
        mu = mean - tmp;
        upper[index] = d + mu;
        lower[index] = d - mu;
        index++;
    }
    return new List<double[]>() { upper, lower };
}
Alice
  • 1,255
  • 1
  • 9
  • 7
  • 1
    +1 You are right with the index incrementation. However, this was an oversight when I typed out the question, so apologies. The implementation above gives the wrong confidence levels. They are way off. Do you use MathNET? Is there anything else that you can see that stands out as wrong with the above implementation? Thanks for your time. – MoonKnight Dec 30 '13 at 14:36
  • No, I never use it, but I already have installed this Lib. Maybe I could help you more :) I will test on my computer and I want to know which are **sample** and **interval** values? – Alice Dec 30 '13 at 16:02