3

I am recently doing some studies on numerical algorithms in C#. Therefore I did some experiments in search for the most suitable math library for .NET. One thing I do very often is to evaluate objective functions, which usually are functions that take a vector as input and returns a vector as output. I compared the implementations of the same objective function in ILNumerics, system array and Math.NET. The syntax of ILNumerics really makes it stand out because it resembles that of MatLab and R for lengthy mathematical formulas. However, I discovered that for the same number of evaluations, ILNumerics seems to be taking much longer than either system array of Math.NET. Below is the code I used to compare. I'm not doing any linear algebra here, just purely applying math formulas over long vectors.

[Test]
public void TestFunctionEval()
{
    int numObj = 2;
    int m = 100000;
    Func<double[], double[]> fun1 = (x) =>
    {
        double[] z = new double[numObj];
        z[0] = x[0];
        double g = 1.0;
        for (int i = 1; i < x.Length; i++)
            g = g + 9.0 * x[i] / (m - 1);
        double h = 1.0 - Math.Sqrt(z[0] / g);
        z[1] = g * h;
        return z;
    };

    Func<ILArray<double>, ILArray<double>> fun2 = (x) =>
    {
        ILArray<double> z = zeros(numObj);
        z[0] = x[0];
        ILArray<double> g = 1.0 + 9.0 * sum(x[r(1, end)]) / (m - 1);
        ILArray<double> h = 1.0 - sqrt(z[0] / g);
        z[1] = g * h;
        return z;
    };

    Func<Vector<double>, Vector<double>> fun3 = (x) =>
    {
        DenseVector z = DenseVector.Create(numObj, (i) => 0);
        z[0] = x[0];
        double g = 1.0 + 9.0*(x.SubVector(1, x.Count - 1) / (m - 1)).Sum();
        double h = 1.0 - Math.Sqrt(z[0] / g);
        z[1] = g * h;
        return z;
    };

    int n = 1000;
    ILArray<double> xs = rand(n, m);
    IList<double[]> xRaw = new List<double[]>();
    for (int i = 0; i < n; i++)
    {
        double[] row = xs[i, full].ToArray();
        xRaw.Add(row);
    }
    DenseMatrix xDen = DenseMatrix.OfRows(n, m, xRaw);
    Stopwatch watch = new Stopwatch();
    watch.Start();
    for (int i = 0; i < n; i++)
    {
        ILArray<double> ret = fun1(xRaw[i]);
    }
    watch.Stop();
    log.InfoFormat("System array took {0} seconds.", watch.Elapsed.TotalSeconds);
    watch.Reset();
    watch.Start();
    for (int i = 0; i < n; i++)
    {
        ILArray<double> ret = fun2(xs[i, full]);
    }
    watch.Stop();
    log.InfoFormat("ILNumerics took {0} seconds.", watch.Elapsed.TotalSeconds);
    watch.Reset();
    watch.Start();
    for (int i = 0; i < n; i++)
    {
        var ret = fun3(xDen.Row(i));
    }
    watch.Stop();
    log.InfoFormat("Math.Net took {0} seconds.", watch.Elapsed.TotalSeconds);
}

Unfortunately, the test shows that ILNumerics is taking too long to do something so simple.

315 | System array took 0.7117623 seconds.
323 | ILNumerics took 14.5100766 seconds.
330 | Math.Net took 5.3917536 seconds.

I really liked the way it made the code look so much like mathematical formulas. However, taking many more times the time taken by system array or Math.NET to evaluate functions such as the above means I have to choose other alternatives instead of ILNumerics even though this will lead to longer and harder to interpret functions.

Am I using ILNumerics in the wrong way? Or is it by design slower in this kind of scenarios. Maybe I'm not using it for the most suitable purpose. Can someone explain?

ILNumerics 3.2.2.0 and Math.NET.Numerics 2.6.1.30 are used in the test.

doraemon
  • 403
  • 6
  • 15

2 Answers2

5

Yes, you are missing some general performance testing rules. And the comparison is also not fair:

  1. For the ILNumerics implementation, you create a lot of temporaries which are of considerable size. This is disadvantageous in comparison to the other implementations, where you create the long vector only once and do all operations in an 'inner loop'. The inner loop will always be faster - to the expense of a less expressive syntax and more programming effort. If you need that performance, you can always use x.GetArraysForRead() and x.GetArrayForWrite() to use the underlying System.Array directly. This gets you the options from your System.Array test...

  2. You include a lot of subarrays creation (and new memory allocations) in your tests for ILNumerics which is not included in other tests. For example, you derive the subarrays from the big test data matrix inside your measurement loop.

Why not design the tests that way: Create 1 large matrix for every test individually. Use Mathnet matrices for the Mathnet test, a System.Array matrix for the System.Array test and ILArray for ILNumerics. In every iteration, extract the corresponding row and give it to the corresponding function.

Do not forget to follow the ILNumerics function rules: http://ilnumerics.net/GeneralRules.html and to run the test with a Release build without any debugger attached. As usual, leave out the time needed for the first iteration.

Depending on your system (and the automatic parallelization options it brings) ILNumerics might still be slower. In that case, consider to follow further optimization options of ILNumerics or to optimize the inner loop by resorting to System.Array.

@Edit: One more note: you probably are aware of the fact, that is it always kind of misleading to do such micro tests without actually doing anything usefull. The results might not be suitable to derive expectations for the performance of the final application from. One example: if you iterate over large arrays solely with System.Array for a long time, you will probably end up spending most the time in GC instead of computing numbers. You will have to be carefull, not to newly allocate any storage which makes your code even more clumsy.

ILNumerics - if used correctly - prevents you from spending time in GC by reusing your memory automatically. Also, it parallelizes your algorithm internally (even if only using vectors is not demanding enough for parallelization as in your example).

Haymo Kutschbach
  • 3,322
  • 1
  • 17
  • 25
  • To my understanding, both 1) and 2) are actually exactly the same in the Math.NET test variant? – Christoph Rüegg Oct 07 '13 at 13:27
  • Right. And if we replace fun2(xs[i, full]) with fun2(xRaw[i]) we actually get very similar results compared to the Math.NET version. – Haymo Kutschbach Oct 07 '13 at 13:48
  • 1
    Then you'd also want to do the same in the Math.NET variant and use fun3(new DenseVector(xRaw[i])) instead of fun3(xDen.Row(i)) ;) – Christoph Rüegg Oct 07 '13 at 14:11
  • Yes, maybe Qian could re-run the tests with these changes and provide us with the results? – Haymo Kutschbach Oct 07 '13 at 14:29
  • Sorry Haymo I forgot to indicate that this is actually the second test. The first was with ILScope and all the rules in place, I was just trying to test out whether with or without ILScope.Enter the code runs faster. Thank you everyone for your suggestions. I'll try out and let you know if there's improvement indeed! @Christoph, I think x.SubVector(1, x.Count) is alright because the second arg is the count of elements to take if I read the api correctly. Thanks for the reminder anyway. – doraemon Oct 07 '13 at 15:43
  • @Christoph, sorry my mistake, x.SubVector(1, numObj) is indeed a mistake, shoulde be x.Count - 1. I'll change and test again. – doraemon Oct 07 '13 at 16:02
  • 1
    @Haymo, I retested and confirmed your recommendation. The slowness is because of fun2(xs[i, full]). When changed to fun2(xRaw[i]), the performance is better than system array(except for the first iteration). Thanks! By the way, I needed to do this redundant test just to make sure the objective function evaluation is not slowed down by any libraries used because the function is called at least a few thousand times each time in real use. – doraemon Oct 07 '13 at 17:46
  • I'd recommend to always utilize the ILNumerics function guidelines, especially if you are constantly calling a performance sensitive function and are dealing with large data. Using ILScope.Enter() may not bring any measurable performance gain in certain situations. But it frees you from the need to monitor the time spent in GC manually. Otherwise: It is always a good advice to use a profiler, of course ;) – Haymo Kutschbach Oct 07 '13 at 17:59
4

Changed the test to the following and ILNumerics performs faster.:

        [Test]
        public void TestFunctionEval()
        {
            int numObj = 2;
            int m = 100000;

            Func<double[], double[]> fun1 = (x) =>
            {
                double[] z = new double[numObj];
                z[0] = x[0];
                double g = 1.0;
                for (int i = 1; i < x.Length; i++)
                    g = g + 9.0 * x[i] / (m - 1);
                double h = 1.0 - Math.Sqrt(z[0] / g);
                z[1] = g * h;
                return z;
            };

            Func<ILInArray<double>, ILRetArray<double>> fun2 = (xIn) =>
            {
                using (ILScope.Enter(xIn))
                {
                    ILArray<double> x = xIn;
                    ILArray<double> z = zeros(numObj);
                    z[0] = x[0];
                    ILArray<double> g = 1.0 + 9.0*sum(x[r(1, end)])/(m - 1);
                    ILArray<double> h = 1.0 - sqrt(z[0]/g);
                    z[1] = g*h;
                    return z;
                }
            };

            Func<Vector<double>, Vector<double>> fun3 = (x) =>
            {
                DenseVector z = DenseVector.Create(numObj, (i) => 0);
                z[0] = x[0];
                double g = 1.0 + 9.0*(x.SubVector(1, m - 1) / (m - 1)).Sum();
                double h = 1.0 - Math.Sqrt(z[0] / g);
                z[1] = g * h;
                return z;
            };

            int n = 1000;
            ILArray<double> xs = rand(n, m);
            IList<double[]> xRaw = new List<double[]>();
            for (int i = 0; i < n; i++)
            {
                double[] row = xs[i, full].ToArray();
                xRaw.Add(row);
            }
            DenseMatrix xDen = DenseMatrix.OfRows(n, m, xRaw);

            int numTest = 10;

            for (int k = 0; k < numTest; k++)
            {
                log.InfoFormat("Round {0}.", k);
                Stopwatch watch = new Stopwatch();
                watch.Reset();
                watch.Start();
                for (int i = 0; i < n; i++)
                {
                    ILArray<double> ret = fun1(xRaw[i]);
                }
                watch.Stop();
                log.InfoFormat("System array took {0} seconds.", watch.Elapsed.TotalSeconds);
                watch.Reset();
                watch.Start();
                for (int i = 0; i < n; i++)
                {
//                    ILArray<double> ret = fun2(xs[i, full]);
                    ILArray<double> ret = fun2(xRaw[i]);
                }
                watch.Stop();
                log.InfoFormat("ILNumerics took {0} seconds.", watch.Elapsed.TotalSeconds);
                watch.Reset();
                watch.Start();
                for (int i = 0; i < n; i++)
                {
//                    var ret = fun3(xDen.Row(i));
                    var ret = fun3(DenseVector.OfEnumerable(xRaw[i]));
                }
                watch.Stop();
                log.InfoFormat("Math.Net took {0} seconds.", watch.Elapsed.TotalSeconds);
            }

 NumericsTest   318      Round 0.
 NumericsTest   327      System array took 0.7008772 seconds.
 NumericsTest   336      ILNumerics took 1.9559407 seconds.
 NumericsTest   315      Math.Net took 5.2027841 seconds.
 NumericsTest   318      Round 1.
 NumericsTest   327      System array took 0.6791225 seconds.
 NumericsTest   336      ILNumerics took 0.4739782 seconds.
 NumericsTest   315      Math.Net took 4.931067 seconds.
 NumericsTest   318      Round 2.
 NumericsTest   327      System array took 0.6734302 seconds.
 NumericsTest   336      ILNumerics took 0.470311 seconds.
 NumericsTest   315      Math.Net took 4.8086843 seconds.
 NumericsTest   318      Round 3.
 NumericsTest   327      System array took 0.6801929 seconds.
 NumericsTest   336      ILNumerics took 0.471479 seconds.
 NumericsTest   315      Math.Net took 4.8423348 seconds.
 NumericsTest   318      Round 4.
 NumericsTest   327      System array took 0.6761803 seconds.
 NumericsTest   336      ILNumerics took 0.4709513 seconds.
 NumericsTest   315      Math.Net took 4.7920563 seconds.
 NumericsTest   318      Round 5.
 NumericsTest   327      System array took 0.6820961 seconds.
 NumericsTest   336      ILNumerics took 0.471545 seconds.
 NumericsTest   315      Math.Net took 4.7798939 seconds.
 NumericsTest   318      Round 6.
 NumericsTest   327      System array took 0.6779479 seconds.
 NumericsTest   336      ILNumerics took 0.4862169 seconds.
 NumericsTest   315      Math.Net took 4.5421089 seconds.
 NumericsTest   318      Round 7.
 NumericsTest   327      System array took 0.6760993 seconds.
 NumericsTest   336      ILNumerics took 0.4704415 seconds.
 NumericsTest   315      Math.Net took 4.8233003 seconds.
 NumericsTest   318      Round 8.
 NumericsTest   327      System array took 0.6759367 seconds.
 NumericsTest   336      ILNumerics took 0.4710648 seconds.
 NumericsTest   315      Math.Net took 4.7945989 seconds.
 NumericsTest   318      Round 9.
 NumericsTest   327      System array took 0.6761679 seconds.
 NumericsTest   336      ILNumerics took 0.4779321 seconds.
 NumericsTest   315      Math.Net took 4.7426801 seconds.
doraemon
  • 403
  • 6
  • 15
  • Thanks for posting the results. Since you are not altering the input parameters in your inner function, you might try to set Settings.AllowInArrayAssignments = false; - it might give you even a little more speed... ? Otherwise - good implementation! :) – Haymo Kutschbach Oct 07 '13 at 18:13
  • Previously it was unfair towards ILNumerics, now it is just as unfair to Math.NET Numerics. For example, in fun3 you do vector division while fun2 you just do a floating point division. And instead of .Create and .OfEnumerable you'd just want to use the DenseVector constructor in both cases. ILNumerics will still be faster but Math.NET will be at least equivalent to the array implementation... – Christoph Rüegg Oct 07 '13 at 18:23
  • Actually it seems the array implementation does compute something else as well. If I change it to do the same thing as fun2 and update fun3 as mentioned above, I typically get results like this: System array took 0.1459477 seconds. ILNumerics took 0.3527597 seconds. Math.Net took 0.7115329 seconds. – Christoph Rüegg Oct 07 '13 at 18:40
  • I ran the tests as described by Christoph and got similar results. Couldn't resist to 'optimize' away the additional subarray access by replacing 'sum(x[r(1, end)])' with 'sum(x) - x[0])' and now fun2 gets pretty close to fun1. I suppose, Math.NET Numerics would archieve the same if we bring it even closer to fun1. But hey, it is still a micro benchmark. And it does not stress the GC. In the real world implementation the situation will probably look completely different. – Haymo Kutschbach Oct 07 '13 at 21:04
  • So at the end you can archieve similar speed with or without any library. You will use a profiler to locate potential bottlenecks _in the final prototype_. Which way to go IMO is more a question of efforts you have to spent for writing the prototype and for optimizing the result. – Haymo Kutschbach Oct 07 '13 at 21:13