3

I often do this...

private void Check()
 {
 string s = "blah";

 if ( new HashSet<string>{"Joe","Eddie","Buckethead"}.Contains(s) )
   Debug.Log("Guitarist.");
 }

In the pipeline, is the HashSet in fact created only the once (at startup time? compile time?) and then used every time?

By the way, I assume that if you do this:

private HashSet<string> g = new HashSet<string>()
                         {"Joe","Eddie","Buckethead"};
private void Check()
 {
 string s = "blah";

 if ( g.Contains(s) )
  Debug.Log("Guitarist.");
 }

then indeed, of course it is only done once when the Class is instantiated. (Or, perhaps at compile time / launch time? But in any event, only the once.)

Fattie
  • 27,874
  • 70
  • 431
  • 719
  • 2
    Where does the first code snippet live? Inside a method? If so, then a new hashset is created each time the method is invoked. – Yacoub Massad Feb 11 '16 at 23:18
  • In .NET, strings that have the same value are actually references to the same string object. – Yacoub Massad Feb 11 '16 at 23:25
  • 1
    @YacoubMassad: That is true by default for *string literals* but not for all strings. The default can also be changed http://stackoverflow.com/questions/16233435/how-to-prevent-string-being-interned and you can explicitly intern strings that are not string literals https://msdn.microsoft.com/en-us/library/system.string.intern%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396 – Eric J. Feb 11 '16 at 23:29
  • Didn't know that. Thanks @EricJ. – Yacoub Massad Feb 11 '16 at 23:32
  • @Enigmativity For the record do you thinik, is "inline collection initialization" (or indeed just "inline initialization") the usual and best term for what I am referring to with that phrase? Perhaps it's a "literal collection initialization"? – Fattie Feb 12 '16 at 00:48
  • 2
    "new" means, well, **new**. You run the code, you get a new object. You run the code twice, you get two new objects. – Eric Lippert Feb 12 '16 at 05:21

3 Answers3

5

Here's the IL for your first method:

Check:
IL_0000:  ldstr       "blah"
IL_0005:  stloc.0     // s
IL_0006:  newobj      System.Collections.Generic.HashSet..ctor
IL_000B:  dup         
IL_000C:  ldstr       "Joe"
IL_0011:  callvirt    System.Collections.Generic.HashSet.Add
IL_0016:  pop         
IL_0017:  dup         
IL_0018:  ldstr       "Eddie"
IL_001D:  callvirt    System.Collections.Generic.HashSet.Add
IL_0022:  pop         
IL_0023:  dup         
IL_0024:  ldstr       "Buckethead"
IL_0029:  callvirt    System.Collections.Generic.HashSet.Add
IL_002E:  pop         
IL_002F:  ldloc.0     // s
IL_0030:  callvirt    System.Collections.Generic.HashSet.Contains
IL_0035:  brfalse.s   IL_0041
IL_0037:  ldstr       "Guitarist."
IL_003C:  call        System.Console.WriteLine
IL_0041:  ret         

Here's the code for the second method:

Check:
IL_0000:  ldstr       "blah"
IL_0005:  stloc.0     // s
IL_0006:  ldarg.0     
IL_0007:  ldfld       g
IL_000C:  ldloc.0     // s
IL_000D:  callvirt    System.Collections.Generic.HashSet.Contains
IL_0012:  brfalse.s   IL_001E
IL_0014:  ldstr       "Guitarist."
IL_0019:  call        System.Console.WriteLine
IL_001E:  ret         

And that's the optimized compile for the code.

So, yep, the first creates a new HashSet every time.

Oh, I changed Debug.Log to Console.WriteLine, but that's a trivial change.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
3

new will create the new object at runtime exactly at the point in time when the thread reaches that line.

So in the case of the first snippet, each invocation of the Check method will cause a new HashSet object to be created.

In the second case, a new HashSet object is created everytime you construct a new instance of the containing class.

Yacoub Massad
  • 27,509
  • 2
  • 36
  • 62
2

This code

{
    string s = "blah";

    if ( new HashSet<string>{"Joe","Eddie","Buckethead"}.Contains(s) )
       Debug.Log("Guitarist.");
}

always instantiates a new HashSet<string>.

By the way, that new object is available as soon as it is no longer referenced, in this case after your closing }.

You are correct that placing it as a class instance field will initialize it once, when the class is initialized. You can use the readonly keyword in that case to prevent your class from changing the initial value after the object is initialized. If you have an expensive initialization that never changes and potentially many instances of your class, you can mark the field as static so that a single instance of the HashSet<string> is shared among all object instances.

hmm .. ok but compilers know to only make once other literals, right? (eg, "strings" etc)

Constants don't incur any initialization overhead. The compiler can use the literal value where appropriate.

By default string literals are interned (see also), meaning that memory for a given string will only be allocated once.

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Huh .. in the second code fragment, the syntax to use for readonly is `private readonly HashSet – Fattie Feb 11 '16 at 23:24
  • 1
    Yes `private readonly HashSet g = ...` Note that with or without readonly, the initialization will only happen once. The readonly keyword prevents the value from changing after field initializers and the constructor have run, which may enable the optimizer to make some assumptions. – Eric J. Feb 11 '16 at 23:26
  • Ah, a pool (interned). Got it. My God, this answer has like 3 world-class insights. (I didn't even think about `static` for that sort of constant - incredible. Facepalm.) Wow thanks. – Fattie Feb 11 '16 at 23:27
  • Here's a follow up question, purely to help me understand better: regarding my second code fragment. If I'm not mistaken, it *would be possible* for a compiler to guess that, I "could have" used the second code pattern. Does that sound right? (TBC I'm not in any way saying "compilers should do that" - I'm just trying to clarify my understanding.) Another way to look at it, there is never a situation where code fragment 1 can not be converted to code fragment 2. ... again am I correct in saying that? – Fattie Feb 11 '16 at 23:30
  • 1
    A theoretical compiler could analyze the second code fragment and decide it's both wise and safe to keep the initialized `HashSet` around for the next call. I'm not a compiler expert, but I suppose the effort to prove both of those conditions is rather significant. Since the programmer has patterns to ensure the initialized data will get re-used, I doubt it will be a priority any time soon. – Eric J. Feb 11 '16 at 23:33
  • Got it. Again just to be clear I was just, uh, clarifying my understanding there. (i.e., ensuring there was "nothing I was missing".) Thanks again. Thanks. – Fattie Feb 11 '16 at 23:35
  • 1
    @JoeBlow - The compiler designers probably wouldn't make this kind of optimization as the code isn't necessarily deterministic. We, as humans, know that the result of calling `new HashSet{"Joe","Eddie","Buckethead"}` will always produce the same `HashSet`, but under the hood it is calling `.Add` and there's no way that the compiler knows that that method doesn't have side-effects - it would be trivial to make a collection that had side-effects - so it can't optimize. – Enigmativity Feb 11 '16 at 23:40
  • 1
    @JoeBlow - Further, there's no way for the compiler to know that `HashSet` doesn't use a stupendous amount of RAM so it might have been the developer's intent to allow the object to exist only briefly and be GC'ed asap. For the compiler to somehow retain the potentially massive object would be a terrible situation. – Enigmativity Feb 11 '16 at 23:42
  • Enig: Amazing x2, regarding (a) by side effects, do you mean (for example) if the added items ("Joe" etc in the example) are not merely string constants? Would it be the case that if they are just string constants or just constants, then, there couldn't be any side effects? (Or - shock - is there something else I don't know?) (b) Lion Meme Facepalm, of course you're right there, duh. – Fattie Feb 11 '16 at 23:45
  • [Sometimes Stackoverflow makes you feel like Luke](https://www.youtube.com/watch?v=535Zy_rf4NU) – Fattie Feb 11 '16 at 23:48
  • 2
    @JoeBlow: A collection could theoretically do something like `public void Add(string item) { dataStructure.Add(DateTime.Now.ToString() + " " + item); }` This case is a bit contrived (but might happen with a collection specialized in logging...). – Eric J. Feb 11 '16 at 23:49
  • @JoeBlow - Don't forget to do the `@` notification to alert people to your comment. I only saw yours because I had this page still open. – Enigmativity Feb 12 '16 at 00:28
  • 1
    @JoeBlow - Eric answer is spot on. The `.Add` method could do all sorts of extra things that affect state. – Enigmativity Feb 12 '16 at 00:29