2

I made a simple benchmark to test HashSet.Add method and for me results looks strange.

For the benchmarking I use BenchmarkDotNet available on github. Here is a full code of a benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;

public struct Point
{
    public int X, Y;
}

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    [Params(10, 500, 1000)]
    public int ArrayLength { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        hs = new HashSet<Point>(100);  // Capacity is 100 to definitely have space for 1 element
        p = new Point();               // Even point is struct I initialize it in Setup
        hs.Add(p);                     // Do warm-up run to be sure at least 1 element was there
    }

    Point p;
    HashSet<Point> hs;

    [Benchmark]
    public void Struct()
    {
        for (var i = 0; i < ArrayLength; i++)
        {                              // The test do the same operation multiple times
            hs.Clear();                // Clear hashset so there will be 0 elements
            hs.Add(p);                 // Add 1 element back
        }
    }
}

My expectation is that as HashSet already initialized with enough capacity AND I do this add only once before cleanup - there should be no additional allocations at all. But in reality here is a benchmark result executed with the command

dotnet run -c Release
Method ArrayLength Mean Error StdDev Median Gen 0 Allocated
Struct 10 644.2 ns 12.90 ns 28.86 ns 633.8 ns 0.0572 240 B
Struct 500 30,080.1 ns 151.63 ns 134.42 ns 30,096.3 ns 2.8687 12,000 B
Struct 1000 64,610.0 ns 1,264.29 ns 1,352.78 ns 64,436.2 ns 5.7373 24,000 B

The size of allocation bytes equal to number of iterations * 24. So each 'Add' do new allocation.

A few comments:

  • I tried the same benchmark with just Clear - 0 allocations, so it is definitely added by Add
  • Changing the number of fields in struct DO change the total amount of allocated bytes. For example 7 int fields uses 48 bytes per operation.

UPDATE

The problem is not in struct copy: I wrote another testcase to check that:

private Point t;
public void Test(Point p){
    t = p;
}

[Benchmark]
public void Struct()
{
    var test = new Point(1, 2);
    Test(test);
}

And in result Allocations = 0

ApmeM
  • 149
  • 1
  • 1
  • 9
  • 3
    So `Point` is a struct you say. With `struct` arithmetics ... right? So, when you add a struct to a datastructure (i.e. passing it by-value as an argument), what do you think happens? See https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/struct#passing-structure-type-variables-by-reference – Fildor Jun 06 '23 at 07:20
  • Yes, this is how structs work. At the same time structs have different lifecycle and place where they are stored. And as I understand this 'Allocated' column shows only managed memory. Structs are not included there. As a confirmation I did the following test: Point t; void Benchmark(){ Test(new Point(1, 2)); } void Test(Point p) { t=p} And in result Allocation = 0 – ApmeM Jun 06 '23 at 16:47
  • I add an update for this case in the main question. – ApmeM Jun 06 '23 at 17:00
  • 1
    Everything that is "safe" in C# is managed. I believe you are mixing the concept of Stack memory with unmanaged memory – Guilherme Jun 06 '23 at 17:49
  • It might happen. But the test in Update and test under another comment shows that this is not a root cause. And either Allocated column shows something wrong, or HashSet Add method really have additional allocations on each Add. List does not have it. – ApmeM Jun 07 '23 at 06:01

2 Answers2

1

As a confirmation I did the following test [...] And in result Allocation = 0

Your test stores the copy on the stack (in Struct) and in a class field that's always there (in Test), so of course you won't get any delta allocations. Both entries will always be there using memory.

And as I understand this 'Allocated' column shows only managed memory

Structs are managed memory, as are all collections holding them. A better test to perhaps visualize how memory allocation works in .Net would be to add a bunch of value objects (struct) to a list and watch allocated (managed) memory going up.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • Ok, I just tried this as well: I took a code from my original benchmark and search&replace HashSet to List. No other changes in benchmark. The result is 0 (zero) allocations again... – ApmeM Jun 06 '23 at 17:50
  • To be on a safe side I replace List to HashSet back and again got allocations = 24*iterations. – ApmeM Jun 06 '23 at 17:51
1

Ok, the problem is found:

HashSet is using GetHashCode method (what a surprise :) ). But this method is defined on Object level.

To run methods from Object, .net requires to do boxing for struct and this is the memory that is allocated during the execution. Final test that proves it:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;
using Test;

public struct Point
{
    public int X, Y;
}

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    private Point p;

    [GlobalSetup]
    public void Setup()
    {
        p = new Point { X = 1, Y = 2 };               // Even point is struct I initialize it in Setup
    }

    [Benchmark]
    public void Struct()
    {
        var hc = p.GetHashCode();
    }
}

And the output:

Method Mean Error StdDev Gen 0 Allocated
Struct 40.03 ns 0.559 ns 0.523 ns 0.0057 24 B

SOLUTION

To solve the issue GethashCode should be overriden for the struct.

Also do not forget to override bool Equals(object obj), and IEquatable as HashSet.Contains also use those methods.

ApmeM
  • 149
  • 1
  • 1
  • 9
  • Oh hey, I actually ran into something similar a while back too, but for me it was `Enum.HasFlags` which doesn't have a generic version and you can't derive from enums, so I had to just replace it with manual bit operations. You're luckier! – Blindy Jun 07 '23 at 14:32