7

Is there a way to get the total number of allocations (note - number of allocations, not bytes allocated)? It can be either for the current thread, or globally, whichever is easier.

I want to check how many objects a particular function allocates, and while I know about the Debug -> Performance Profiler (Alt+F2), I would like to be able to do it programmatically from inside my program.

// pseudocode
int GetTotalAllocations() {
    ...;
}    
class Foo {
    string bar;
    string baz;
}
public static void Main() {
    int allocationsBefore = GetTotalAllocations();
    PauseGarbageCollector(); // do I need this? I don't want the GC to run during the function and skew the number of allocations
    // Some code that makes allocations.
    var foo = new Foo() { bar = "bar", baz = "baz" };
    ResumeGarbageCollector();
    int allocationsAfter = GetTotalAllocations();
    Console.WriteLine(allocationsAfter - allocationsBefore); // Should print 3 allocations - one for Foo, and 2 for its fields.
}

Also, do I need to pause garbage collection to get accurate data, and can I do that?

Do I need to use the CLR Profiling API to achieve that?

cassandrad
  • 3,412
  • 26
  • 50
sashoalm
  • 75,001
  • 122
  • 434
  • 781

3 Answers3

4

First up, you can pause the GC by calling System.GC.TryStartNoGCRegion and unpause it with System.GC.EndNoGCRegion.

For only knowing how many bytes got allocated, there is System.GC.GetAllocatedBytesForCurrentThread which returns the total bytes allocated for the current thread. Call it before and after the code to measure and the difference is the allocation size.

Counting the number of allocations is a little bit tricky. There are possibly quite a few ways to do it which are all sub-optimal in some way today. I can think of one idea:

Modifying the default GC

Starting with .NET Core 2.1 there is the possibility to use a custom GC, a so called local GC. It's said that the development experience, documentation and usefulness is not the best, but depending on the details of your problem it can be helpful for you.

Every time an object is allocated the runtime calls Object* IGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags). IGCHeap is defined here with the default GC implementation here (GCHeap::Alloc implemented in line 37292).

The guy to talk to here would be Konrad Kokosa with two presentations on that topic: #1, #2, slides.

We can take the default GC implementation as is and modify the Alloc-method to increment a counter on each call.

Exposing the counter in managed code

Next up to make use of the new counter, we need a way to consume it from managed code. For that we need to modify the runtime. Here I'll describe on how to do that by expanding the GC interface (exposed by System.GC).

Note: I do not have practical experience in doing this and there are probably some problems to encounter when going this route. I just want to be precise with my idea.

By taking a look at ulong GC.GetGenerationSize(int) we are able to find out how to add a method which results in an internal CLR call.

Open \runtime\src\coreclr\src\System.Private.CoreLib\src\System\GC.cs#112 and declare a new method:

[MethodImpl(MethodImplOptions.InternalCall)]
internal static extern ulong GetAllocationCount();

Next, we need to define that method on the native GCInterface. For that, got to runtime\src\coreclr\src\vm\comutilnative.h#112 and add:

static FCDECL0(UINT64, GetAllocationCount);

To link these two methods, we need to list them in runtime\src\coreclr\src\vm\ecalllist.h#745:

FCFuncElement("GetAllocationCount", GCInterface::GetAllocationCount)

And lastly, actually implementing the method at runtime\src\coreclr\src\vm\comutilnative.cpp#938:

FCIMPL0(UINT64, GCInterface::GetAllocationCount)
{
    FCALL_CONTRACT;

    return (UINT64)(GCHeapUtilities::GetGCHeap()->GetAllocationCount());
}
FCIMPLEND

That would get a pointer to the GCHeap where our allocation counter lives. The method GetAllocationCount that exposes this on it does not exists yet, so let's create it:

runtime\src\coreclr\src\gc\gcimpl.h#313

size_t GetAllocationCount();

runtime\src\coreclr\src\gc\gcinterface.h#680

virtual size_t GetAllocationCount() = 0;

runtime\src\coreclr\src\gc\gcee.cpp#239

size_t GCHeap::GetAllocationCount()
{
    return m_ourAllocationCounter;
}

For our new method System.GC.GetAllocationCount() to be usable in managed code we need to compile against a custom BCL. Maybe a custom NuGet package will work here too (which defines System.GC.GetAllocationCount() as an internal call as seen above).

Closing

Admittedly, this would be quite a bit of work if not done before and a custom GC + CLR might be a bit overkill here, but I thought I should throw it out there as a possibility.

Also, I have not tested this. You should take it as a concept.

Bruno Zell
  • 7,761
  • 5
  • 38
  • 46
  • `GC.TryStartNoGCRegion` takes a parameters, how to determine what value should be passed there? – cassandrad Apr 17 '20 at 08:02
  • Docs linked, see remarks: The TryStartNoGCRegion(Int64) method [...] disallows garbage collection while an app executes a critical region of code. If the runtime is unable to initially allocate the requested amount of memory, the garbage collector performs a full blocking garbage collection in an attempt to free additional memory and enters no GC region latency mode if it is able to, [...]. **totalSize must be large enough to handle all memory allocations that occur in the critical path. This includes allocations by the app, as well as allocations that the runtime makes on the app's behalf.** – Bruno Zell Apr 17 '20 at 09:06
  • Note that the parameter can't be larger than the size of an ephemeral segment. – Bruno Zell Apr 17 '20 at 09:13
  • But how I can determine how much memory will be allocated if I want to measure how much memory will be allocated and don't know it beforehand? And what if my function will allocate more than an ephemeral segment? I tried that method, but it didn't work out as I didn't know the value I should choose to pass to the function. – cassandrad Apr 17 '20 at 09:16
  • I'd go for the maximum then which can be multiple gigabytes on some systems. When total allocations exceed that limit GC kicks in as if we had called GC.EndNoGCRegion. Always depends on the specific use case and it sure has it's limitations, it's just that I am not aware of any alternatives. – Bruno Zell Apr 17 '20 at 09:27
  • Oh, now I see why I wasn't able to pass there several Gigabytes — I'm using a workstation, and it has the limit of 256 MBs, Guess I have to use this method on server GC to be able to pause GC with huge allocations. – cassandrad Apr 17 '20 at 10:01
  • Hey, thank you for the answer! It's a bummer that it has to be so difficult. Tbh I was expecting there would be something equivalent to _CrtMemCheckpoint and _CrtMemDifference in VS C++. – sashoalm Apr 17 '20 at 21:37
  • @sashoalm It looks like these functions are utilized in debug mode only. Usually counting the allocations made or even the allocation size in bytes is used for diagnostics or benchmarking only, thus the go-to solution would be out-of-process profilers. What is the larger problem you want to solve? – Bruno Zell Apr 17 '20 at 23:40
  • @cassandrad The thing here is that `GC.TryStartNoGCRegion` & co are not designed for diagnostics but more for code paths where you have to have a _constant_ latency with a low standard deviation. By disabling the GC you can be sure only your code will be executing. – Bruno Zell Apr 17 '20 at 23:47
4

You can record every allocation. But your logic to do this inside your process is flawed. .NET Core supports in process ETW data collection which makes it also possible to record all allocation events. See

Starting with .NET Core 2.2, CoreCLR events can now be consumed using the System.Diagnostics.Tracing.EventListener class. These events describe the behavior of such runtime services as GC, JIT, ThreadPool, and interop. These are the same events that are exposed as part of the CoreCLR ETW provider. This allows for applications to consume these events or use a transport mechanism to send them to a telemetry aggregation service. You can see how to subscribe to events in the following code sample:

internal sealed class SimpleEventListener : EventListener
{
    // Called whenever an EventSource is created.
    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        // Watch for the .NET runtime EventSource and enable all of its events.
        if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime"))
        {
            EnableEvents(eventSource, EventLevel.Verbose, (EventKeywords)(-1));
        }
    }

    // Called whenever an event is written.
    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        // Write the contents of the event to the console.
        Console.WriteLine($"ThreadID = {eventData.OSThreadId} ID = {eventData.EventId} Name = {eventData.EventName}");
        for (int i = 0; i < eventData.Payload.Count; i++)
        {
            string payloadString = eventData.Payload[i]?.ToString() ?? string.Empty;
            Console.WriteLine($"\tName = \"{eventData.PayloadNames[i]}\" Value = \"{payloadString}\"");
        }
        Console.WriteLine("\n");
    }
}

That should be giving when you enable GC evets (0x1) instead of -1 all the GC pause times and GC events you would need to diagnose yourself in-process.

There are allocation sampling mechanism built into .NET Core and .NET Framework since ages which enable sampling object allocation metrics on every up to 5 alloc events/s GC_Alloc_Low or 100 alloc events/s GC_Alloc_High allocated object. There seems no way to get all allocation events but if you read the .NET Core code

BOOL ETW::TypeSystemLog::IsHeapAllocEventEnabled()
{
    LIMITED_METHOD_CONTRACT;

    return
        // Only fire the event if it was enabled at startup (and thus the slow-JIT new
        // helper is used in all cases)
        s_fHeapAllocEventEnabledOnStartup &&

        // AND a keyword is still enabled.  (Thus people can turn off the event
        // whenever they want; but they cannot turn it on unless it was also on at startup.)
        (s_fHeapAllocHighEventEnabledNow || s_fHeapAllocLowEventEnabledNow);
}

you find that you can get all allocation events via ETW when

  1. ETW Allocation profiling must be enabled when the process is started (enabling later will NOT work)
  2. GC_Alloc_High AND GC_Allow_Low keywords are enabled

You can record all allocations inside a .NET Core 2.1+ process if an ETW session which record allocation profiling data is present.

Sample:

C>perfview collect  c:\temp\perfViewOnly.etl -Merge:true -Wpr -OnlyProviders:"Microsoft-Windows-DotNETRuntime":0x03280095::@StacksEnabled=true
C>AllocTracker.exe
    Microsoft-Windows-DotNETRuntime
    System.Threading.Tasks.TplEventSource
    System.Runtime
    Hello World!
    Did allocate 24 bytes
    Did allocate 24 bytes
    Did allocate 24 bytes
    Did allocate 76 bytes
    Did allocate 76 bytes
    Did allocate 32 bytes
    Did allocate 64 bytes
    Did allocate 24 bytes
    ... endless loop!

    using System;
    using System.Diagnostics.Tracing;

    namespace AllocTracker
    {
        enum ClrRuntimeEventKeywords
        {
            GC = 0x1,
            GCHandle = 0x2,
            Fusion = 0x4,
            Loader = 0x8,
            Jit = 0x10,
            Contention = 0x4000,
            Exceptions                   = 0x8000,
            Clr_Type                    = 0x80000,
            GC_AllocHigh =               0x200000,
            GC_HeapAndTypeNames       = 0x1000000,
            GC_AllocLow        =        0x2000000,
        }

        class SimpleEventListener : EventListener
        {
            public ulong countTotalEvents = 0;
            public static int keyword;

            EventSource eventSourceDotNet;

            public SimpleEventListener() { }

            // Called whenever an EventSource is created.
            protected override void OnEventSourceCreated(EventSource eventSource)
            {
                Console.WriteLine(eventSource.Name);
                if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime"))
                {
                    EnableEvents(eventSource, EventLevel.Informational, (EventKeywords) (ClrRuntimeEventKeywords.GC_AllocHigh | ClrRuntimeEventKeywords.GC_AllocLow) );
                    eventSourceDotNet = eventSource;
                }
            }
            // Called whenever an event is written.
            protected override void OnEventWritten(EventWrittenEventArgs eventData)
            {
                if( eventData.EventName == "GCSampledObjectAllocationHigh")
                {
                    Console.WriteLine($"Did allocate {eventData.Payload[3]} bytes");
                }
                    //eventData.EventName
                    //"BulkType"
                    //eventData.PayloadNames
                    //Count = 2
                    //    [0]: "Count"
                    //    [1]: "ClrInstanceID"
                    //eventData.Payload
                    //Count = 2
                    //    [0]: 1
                    //    [1]: 11

                    //eventData.PayloadNames
                    //Count = 5
                    //    [0]: "Address"
                    //    [1]: "TypeID"
                    //    [2]: "ObjectCountForTypeSample"
                    //    [3]: "TotalSizeForTypeSample"
                    //    [4]: "ClrInstanceID"
                    //eventData.EventName
                    //"GCSampledObjectAllocationHigh"
            }
        }

        class Program
        {
            static void Main(string[] args)
            {
                SimpleEventListener.keyword = (int)ClrRuntimeEventKeywords.GC;
                var listener = new SimpleEventListener();

                Console.WriteLine("Hello World!");

                Allocate10();
                Allocate5K();
                GC.Collect();
                Console.ReadLine();
            }
            static void Allocate10()
            {
                for (int i = 0; i < 10; i++)
                {
                    int[] x = new int[100];
                }
            }

            static void Allocate5K()
            {
                for (int i = 0; i < 5000; i++)
                {
                    int[] x = new int[100];
                }
            }
        }

    }

Now you can find all allocation events in the recorded ETL file. A method allocating 10 and another one with 5000 array allocations.

PerfView Allocation Recording

The reason why I did tell you that you logic is flawed is that even a simple operation like printing the allocation events to console will allocate objects. You see where this will end up? If you want to achieve that the complete code path must be allocation free which is not possible I guess because at least the ETW event listener needs to allocate your event data. You have reached the goal but crashed your application. I would therefore rely on ETW and record the data from the outside or with a profiler which needs for the same reason to be unmanaged.

With ETW you get all allocation stacks and type information which is all you need not only to report but also to find the offending code snippet. There is more to it about method inlining but that is already enough for an SO post I guess.

Alois Kraus
  • 13,229
  • 1
  • 38
  • 64
  • Thank you! Unfortunately I already awarded the bounty a couple of hours ago or I would have awarded it to you. Btw there would be no allocations if we just increment a counter inside OnEventWritten(). We can check the counter before/after a function to see how many allocations it performed. – sashoalm Apr 18 '20 at 06:36
  • As I said the Eventwritteneventargs passed to you will already be allocated. That will cause an endless loop unless there is extra stuff implemented to prevent that. You need to try it out and raise an issue for .NET Core if it is important to you. You still can mark it as answer. – Alois Kraus Apr 18 '20 at 06:53
-1

You need to use some kernel32 function, but it is possible!!:) I did not write the full code, but I hope you get the feeling how should be done.

First, you need all process with function : Process.GetProcesses link then you need to create a snapshot from it CreateToolhelp32Snapshot beacuse of this snapshot does not need the "pause of the GC", and after you need to create cycle to enumerate on all memory block. The cycle function is initialized with Heap32ListFirst and Heap32First and after you can call the Heap32Next until it success.

And you can call the kerner32 function, when it declares in your code like this:

[DllImport("kernel32", SetLastError = true, CharSet = System.Runtime.InteropServices.CharSet.Auto)]
static extern IntPtr CreateToolhelp32Snapshot([In]UInt32 dwFlags, [In]UInt32 th32ProcessID);

Here is the c++ sample, but you can do the same, after the CSharp function declaration: Traversing the Heap List

I know it is not easy, but there is no simple way. By the way, if you call the Toolhelp32ReadProcessMemory inside the loop, the you can retrieve lot of useful other information.


And I found the pinvoke.net maybe it helps you pinvoke.net

https://www.pinvoke.net/default.aspx/kernel32.createtoolhelp32snapshot https://www.pinvoke.net/default.aspx/kernel32.Heap32ListFirst

György Gulyás
  • 1,290
  • 11
  • 37
  • Thank you for the proposed solution! A few comments - this enumerates blocks at the native level, which would include any memory allocated by native code. What is a bigger issue IMO is that the C# allocator might allocate a few large blocks of memory and manage them itself. Then allocating a new C# object would not correlate to any new heap allocations from the point of view of the kernel. – sashoalm Apr 13 '20 at 06:55
  • @sashoalm Meybe, Not :) Because the C# and the c++ have a inter-operation, and I think each of allocated C# memory block is available and can handle in c++ and vica-versa. Maybe each of the block is handled by kernel32. It have to test. The kernel32 memory management is pretty good, so maybe the .Net run-time does not have own. – György Gulyás Apr 13 '20 at 09:19
  • CreateToolhelp32Snapshot deals with the C/C++ heap this is disjoint to the managed heap. These two have no correlation. The GC allocates its memory directly via VirtualAlloc which is the allocation method which also the C/C++ heap uses. The managed and unmanaged heap use the same foundation but the heap management is not related. The GC heap does not build upon the unmanaged heap. – Alois Kraus Apr 15 '20 at 06:47
  • @AloisKraus any source? – György Gulyás May 18 '20 at 20:32
  • @GyörgyGulyás: See https://raw.githubusercontent.com/dotnet/coreclr/d33f73f69051d2861454081bb3211615413d8ed0/src/gc/gc.cpp and look at the virtual_alloc method which calls into VirtualAlloc. This is the common mechanism used by the C-Allocator (malloc, new, ...) to allocate memory from the OS and then manage its own heap. – Alois Kraus May 18 '20 at 22:50