-1

I know HashSets are inherently unordered, but of course the collection is stored in some order, probably based on the hash buckets. The First extension method grabs the first element and delivers it to the caller. My question is the following: since the .NET platform is a standard with potentially several implementations, is it written in stone that the First extension method (from the System.Linq namespace) should always return the same element for unordered collections like HashSets as long as that the contents of the collection don't change? I'm imagining things like memory optimisation moving instances around and maybe, if that was not one of the requirements of the standard for First, that could end up causing different behaviour on different implementations of the platform.

'Can I rely on First to behave, both now and in the future, no matter on what device?' would be the gist of what I'm asking.

FinnTheHuman
  • 1,115
  • 13
  • 29
  • 2
    Unless you cache a .First() on a HashSet today on a Win10 and compare it to a .First() on Android 12.3 in 15 years - for the same exact HashSet - you wont ever know. Just hope the HashSet implementation is the same then. – Patrick Artner Dec 01 '17 at 16:53
  • 1
    This answer explains that it would be a very bad idea to rely on that: https://stackoverflow.com/a/657289/2651069 – Tao Gómez Gil Dec 01 '17 at 16:56
  • I'm interested why would you ever need to rely on this? – Evk Dec 01 '17 at 17:13
  • My solution to a problem that's not that simple to explain... basically a little part of a spell checker that uses no artificial intelligence whatsoever. A dumb spell checker. It's for my mum, an addon for her to use in her work software, because if she types something a little different, say an í instead of an i, she's in trouble. That's a little part of the solution. A tiny algorithm that would fit to do a tiny task inside the whole thing. – FinnTheHuman Dec 01 '17 at 17:18
  • 1
    Maybe you need `SortedSet` instead (same as hash set but sorted). At least on it it makes sense to call First (unlike doing that on HashSet). – Evk Dec 01 '17 at 17:43
  • That could work. I don't exactly need the order of the elements among themselves, but I doubt it would break anything if that's what I got. And it would be a little price to pay compared to `List.Contains` on every insertion. Thank you for that advice. – FinnTheHuman Dec 01 '17 at 18:21

2 Answers2

1

You can rely on First() to return the first element returned by the GetEnumerator() call on the class you did First() on.

However, as with any unordered collection, it undefined behavior that the first item returned from HashSet.GetEnumerator() will always be the same item on multiple calls to a unchanged collection. It may return the same today, but there is no contract stating it needs to remain that way in future versions.

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • I understand. That's a shame... My whole algorithm just crumbled... I'll have to use a list and Contains instead I suppose. Thanks. – FinnTheHuman Dec 01 '17 at 17:05
0

Check: * creating multiple HashSet * using First() on each insert of number into HashSet and remember each into another HashSet * after all inserts all the HashSet contain the same data

Printing all First()'s

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;

public class Program
{
    static IEnumerable<int> Range(int min, int max)
    {
        for (int i = min; i <= max; i++)
            yield return i;
    }

    public static void Main()
    {
        var firsts = new HashSet<int>();
        for (int i = 0; i < 10; i++)
        {
            Console.WriteLine("Run: " + i.ToString());
            var h = new HashSet<int>();
            foreach (var num in Range(-1000, +1000).OrderBy(o => Guid.NewGuid()).ToList())
            {
                h.Add(num);
                if (h.Count == 1)
                    Console.WriteLine("first value inserted: " + num.ToString());
                firsts.Add(h.First());
            }

            Console.WriteLine("All firsts: " + string.Join(",", firsts));
            firsts.Clear();
        }

        Console.ReadLine();
    }
}

Observations:

  • each HashSet in itself produces only 1 First-Value over all 2001 inserts
  • all HashSet produce different .First() - Values if they started with other first int

Run: 0 first value inserted: 507 All firsts: 507 Run: 1 first value inserted: 511 All firsts: 511 Run: 2 first value inserted: -600 All firsts: -600 Run: 3 first value inserted: -624 All firsts: -624 Run: 4 first value inserted: -367 All firsts: -367 Run: 5 first value inserted: -110 All firsts: -110 Run: 6 first value inserted: 983 All firsts: 983 ... etc ...

  • an HashSet instance seems to have the same First() in its livetime
  • it is the first item put into the HashSet
  • different HashSet filled differently seem to have different First()s when/if the first value differs
  • At least under Windows and current compiler it seems for HashSet .First() it is predictable
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Yes a `HashSet` seems to have the same return for `First` in my machine over the course of a little more than 40 minutes of testing. But my problem is knowing if that's a contract in the standard of .NET. Like, in the standard, long integers have to have 64 bits. You know that for sure, no matter which platform, no matter when (unless a breaking change is made to the standard) long ints will have 64 bits. – FinnTheHuman Dec 01 '17 at 17:16