0

I'm trying to understand one problem that I encountered recently in my project. I'm using Aurigma library to resize images. It is used in the single thread mode and produce only one thread during calculation. Lately I decided to move to ImageMagick project, because it is free and open source. I've built IM in the single thread mode and started to test. At first I wanted to compare their performance without interruptions, so I created a test that has high priorities for a process and its thread. Also, I set affinity to one core. I got that IM faster than Aurigma on ~25%. But than more threads I added than less IM had advantage against Aurigma.

My project is a windows service that starts about 7-10 child processes. Each process has 2 threads to process images. When I run my test as two different processes with 2 threads each, I noticed that IM worked worse than Aurigma on about 5%.

Maybe my question is not very detailed, but this scope is a little new for me and I would be glad to get direction for further investigation. How can it be that one program works faster if it is run on one thread in one process, but worse if it is run in multiple processes at the same time.

Fro example,

Au: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 245 secs

IM: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 280 secs

Au: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 121 secs

IM: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 141 secs

We can see that Au works better if we have more that 1 process, but in single process mode: Au process one task for 2,2 sec, IM for 1,4 sec and the sum time is better for IM

private static void ThreadRunner(
        Action testFunc,
        int repeatCount,
        int threadCount
        )
    {
        WaitHandle[] handles = new WaitHandle[threadCount];

        var stopwatch = new Stopwatch();

        // warmup
        stopwatch.Start();
        for (int h = 0; h < threadCount; h++)
        {
            var handle = new ManualResetEvent(false);
            handles[h] = handle;

            var thread = new Thread(() =>
            {
                Runner(testFunc, repeatCount);
                handle.Set();
            });

            thread.Name = "Thread id" + h;
            thread.IsBackground = true;
            thread.Priority = ThreadPriority.Normal;

            thread.Start();
        }

        WaitHandle.WaitAll(handles);
        stopwatch.Stop();
        Console.WriteLine("All Threads Total time taken " + stopwatch.ElapsedMilliseconds);
    }

    private static void Runner(
        Action testFunc,
        int count
        )
    {
        //Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core 
        Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.BelowNormal;
        Process.GetCurrentProcess().PriorityBoostEnabled = false;
        Thread.CurrentThread.Priority = ThreadPriority.Normal;

        var stopwatch = new Stopwatch();

        // warmup
        stopwatch.Start();
        while(stopwatch.ElapsedMilliseconds < 10000)
            testFunc();
        stopwatch.Stop();

        long elmsec = 0;
        for (int i = 0; i < count; i++)
        {
            stopwatch.Reset();
            stopwatch.Start();
            testFunc();
            stopwatch.Stop();

            elmsec += stopwatch.ElapsedMilliseconds;
            Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks +
                            " mS: " + stopwatch.ElapsedMilliseconds + " Thread name: " + Thread.CurrentThread.Name);
        }

        Console.WriteLine("Total time taken " + elmsec + " Thread name: " + Thread.CurrentThread.Name);
    }

    /// <summary>
    /// Entry point
    /// </summary>
    /// <param name="args"></param>
    private static void Main(string[] args)

    {
        var files = GetFiles(args.FirstOrDefault());
        if (!files.Any())
        {
            Console.WriteLine("Source files were not found.");
            goto End;
        }           

        //// Run tests
        Console.WriteLine("ImageMagick run... Resize");
        Runner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20);

        Console.WriteLine("Aurigma run... Resize");
        Runner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20);

        Console.WriteLine("ImageMagick run... multi Resize");
        ThreadRunner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20, 2);

        Console.WriteLine("Aurigma run... multi Resize");
        ThreadRunner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20, 2);

    End:
        Console.WriteLine("Done");
        Console.ReadKey();
    }

    public static void RunResizeImageMagickTest(string source)
    {
        float[] ratios = { 0.25f, 0.8f, 1.4f };

        // load the source bitmap
        using (MagickImage bitmap = new MagickImage(source))
        {
            foreach (float ratio in ratios)
            {
                // determine the target image size
                var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));

                MagickImage thumbnail = null;
                try
                {
                    thumbnail = new MagickImage(bitmap);

                    // scale the image down
                    thumbnail.Resize(size.Width, size.Height);
                }
                finally
                {
                    if (thumbnail != null && thumbnail != bitmap)
                    {
                        thumbnail.Dispose();
                    }
                }
            }
        }
    }

    public static void RunResizeAurigmaTest(string source)
    {
        float[] ratios = { 0.25f, 0.8f, 1.4f };

        //// load the source bitmap
        using (ABitmap bitmap = new ABitmap(source))
        {
            foreach (float ratio in ratios)
            {
                // determine the target image size
                var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));

                ABitmap thumbnail = null;
                try
                {
                    thumbnail = new ABitmap();

                    // scale the image down
                    using (var resize = new Resize(size, InterpolationMode.HighQuality))
                    {
                        resize.ApplyTransform(bitmap, thumbnail);
                    }
                }
                finally
                {
                    if (thumbnail != null && thumbnail != bitmap)
                    {
                        thumbnail.Dispose();
                    }
                }
            }
        }
    }

Code for testing is added. I use C#/.NET, ImageMagick works through ImageMagick.Net library, for Aurigma there is one too. For IM .net lib is written on C++/CLI, IM is C. A lot of languages are used.

OpenMP for IM is off.

noname
  • 3
  • 2

1 Answers1

0

Could be a memory cache issue. It is possible that multiple threads utilizing memory in a certain way create a scenario that one thread invalidates cache memory that another thread was using, causing a stall.

Programs that are not purely number crunching, but rely on a lot of IO (CPU<->Memory) are more difficult to analyze.

Photon
  • 3,182
  • 1
  • 15
  • 16
  • Maybe you could suggest some practices how to analyze this problem? – noname Sep 02 '15 at 09:41
  • I'm assuming you're not going to change the code to optimize for your purposes, but only want to know how to use it optimally under the current situation. I suggest to simply perform the test (automatically) as you did, and empirically select the number of process / threads / tasks to get the maximum performance. – Photon Sep 02 '15 at 09:43
  • @noname: use performance counters (e.g. Intel's VTune) to track cache misses. Last-level cache and main memory bandwidth are both shared between all cores. I'm not sure of the best way to see if multiple threads are saturating main memory bandwidth, but one thread isn't. http://agner.org/optimize/ has a lot of good stuff about optimizing C/C++ and asm, and low-level details about specific CPUs. – Peter Cordes Sep 02 '15 at 09:46
  • Photon, at first I would prefer to know what I should optimize, moving to IM have many benefits for me and I do not want to give up so easily. I tried to play with number of processes/theads, but Aurigma works better with any numbers that sufficient for production evn.(If I have 8 cores I can not run only 1 process, I need 8). So there is only one way to move to IM is to find why it has this problem for more than one process. ------ Peter Cordes, thanks a lot. I'll definitely look at it. – noname Sep 02 '15 at 09:57