1

I have simplified my original issue into this test.

Using this class:

public class Unmanaged : IDisposable
{
    private IntPtr unmanagedResource;

    public Unmanaged()
    {
        this.unmanagedResource = Marshal.AllocHGlobal(10 * 1024 * 1024);
    }
    public void DoSomethingWithThisClass()
    {
        Console.WriteLine($"{DateTime.Now} - {this.unmanagedResource.ToInt64()}");
    }

    private bool disposedValue = false; // To detect redundant calls

    protected virtual void Dispose(bool disposing)
    {
        if (!disposedValue)
        {
            Marshal.FreeHGlobal(unmanagedResource);
            disposedValue = true;
        }
    }

    ~Unmanaged() {
       Dispose(false);
     }

    void IDisposable.Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
}

I have these two tests:

public class UnitTest1
{
    const int Runs = 100000;

    [TestMethod]
    public void UsingFor()
    {
        for (var i = 0; i <= Runs; i++)
        {
            using (var unman = new Unmanaged())
            {
                unman.DoSomethingWithThisClass();
            }
        }
    }

    [TestMethod]
    public void UsingParallelFor()
    {
        Parallel.For(0, Runs, new ParallelOptions() { MaxDegreeOfParallelism = 10},
            index => {
                using (var unman = new Unmanaged())
                {
                    unman.DoSomethingWithThisClass();
                }
            });
    }
}

ParallelFor generally takes about twice as long as the regular for. According to the profiler, 62%-65% of the execution time is spent inside FreeHGlobal for the ParallelFor. Only 52%-53% is being spent inside FreeHGlobal for the regular for.

I assumed with modern RAM systems this would not make too much of a difference. Is there any way to handle large chunks of un-managed memory in multiple processes? Is there a way I can change this to have it multi threaded?

If I do not Dispose of the RAM used in each process (bad idea, but just to test), Parallel For is twice as fast, but then I can only open about 4-5 of these (they are large amounts of image data) at the same time before the app crashes (with, as you guessed, an out of RAM exception).

Why does more than one Dispose action on separate objects slow things down?

I can leave them single threaded if that is the only option, but I was hoping to speed this up.

Thank you.

  • 1
    There's a lock built into AllocHGlobal(), it keeps the heap thread-safe. So what you are measuring is how long the lock is held, it inevitably takes longer while another thread also is busy allocating memory. – Hans Passant Feb 09 '17 at 21:53

1 Answers1

0

FreeHGlobal almost certainly blocks. That means only one thread in your process can run it at a time. They get in line and wait. There is overhead for that, so it's slower.

You can make it faster by creating a single large block of unmanaged memory and running a lock-free allocator in it.

hoodaticus
  • 3,772
  • 1
  • 18
  • 28
  • 1
    I did not realize there was an internal lock. That makes me have to approach the issue differently. I like your idea of creating a memory block large enough to place about four of these into RAM at once. Then I can set up the jobs I need to process in a queue and have a controller divide the processes into different chunks of the RAM work area. I can clear out a section used in a previous process before starting another task from the queue. It is a bit more coding overhead, but should save hours to the amount of processing that needs to be done with these. – James Soult Feb 10 '17 at 04:16
  • You are definitely following the right path there in my experience. Try to think about ways you might be able to do it without locking, or alternately, look into the wonderful memory pools out there. – hoodaticus Feb 10 '17 at 14:22