Why is this implemented as a struct?

Question

In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:

 internal struct ItemList<T> where T : class
  {
    private T[] items;
    private int count;
    ...(methods)...
  }

(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)

My question is what are the benefits of implementing this as a struct over a class?

Attention attention: Paging doctor Skeet, I repeat: Paging doctor Skeet — sehe, Jun 01 '11 at 06:39
@manojlds - **[Wrong.](http://meta.stackexchange.com/questions/555/why-does-jon-skeet-never-sleep/566#566)** If this is a normal day, he'd be at work, with "some SO presence". — Kobi, Jun 01 '11 at 07:51
@Kobi - I thought he was in UK? Well anyway, around Skeet, I am always wrong! — manojlds, Jun 01 '11 at 13:34
Maybe there are no benefits in this case but it was done so on the whim of a developer and thus stands as it is because there is no advantage or pressing reason to change. — , Jun 02 '11 at 20:00
The benefit is obvious: less cache misses since you have items in the in the cacheline when you access the ItemList. W/ class impl: there is one indirection more. — bestsss, Jun 04 '11 at 12:31

Martin Liversage · Answer 1 · 2011-06-01T11:40:12.800

Lets assume that you want to store ItemList<T> in an array.

Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.

Struct vs class

After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.

The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.

Love the diagrams! Unfortunately, there is no array of ItemList involved here, just two instances: — Simon Hewitt, Jun 01 '11 at 10:24

score 9 · Answer 2 · answered Jun 01 '11 at 06:57

For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.

We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.

The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.

Renatas M. · Answer 3 · 2011-06-01T06:59:32.600

6

Maybe its important that...quote about struct from here

The new variable and the original variable therefore contain two separate copies of the same data. Changes made to one copy do not affect the other copy.

Just thinking, dont judge me hard :)

edited Jun 01 '11 at 06:59

answered Jun 01 '11 at 06:53

Renatas M.

11,694
1
43
62

2

Good point. However, when a count is associated with an array chances are the the count keeps track of how many items in the array are in use. If you copy this struct and modify the count in the new struct the original struct will still reference the same array but the count is not updated. So for this particular use of a struct I'm pretty sure that the developers have to be very careful __not__ to copy the struct. – Martin Liversage Jun 01 '11 at 07:15
That only applies to structures that only contain value types. As the structure contains an array, you will *not* get a copy of the data in the array, you will only get a copy of the *reference* to the array. – Guffa Jun 01 '11 at 07:38
@Martin: I believe you are correct about the Count tracking how many items in the array are in use. @Guffa: You are also correct and the struct is copied by the developers (Microsoft). – Simon Hewitt Jun 01 '11 at 10:31

Guffa · Answer 4 · 2011-06-01T11:08:48.297

5

There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.

As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.

So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.

One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:

ItemList<string>[] = new ItemList<string>[42];

As the items in the array are zero-filled, the count member will be zero and the items member will be null.

edited Jun 01 '11 at 11:08

answered Jun 01 '11 at 07:34

Guffa

687,336
108
737
1,005

Value-type semantics can work just fine with a struct that contains an array field if the code for the struct ensures that once a reference to an array is stored in the struct, no reference to that array will be held by code that would try to mutate it. For example, a `MutableValueArray` struct could have a `this[int]` setter which makes a copy of the array, modifies the copy, and overwrites its array reference with a reference to the copy. Add `Interlocked.CompareExchange` and it can even be thread-safe. – supercat Aug 28 '13 at 20:38
@supercat: Yes, that would work, but it would make it very inefficient. – Guffa Aug 28 '13 at 21:47
@Guffa: My point was not to describe an efficient structure, but rather demonstrate that mutable value semantics are achievable. If one wanted to simulate an array of significant size with value semantics, a tree structure could work pretty well (the extra costs of modification versus having code use an unshared array might be outweighed by the huge reduction in "copying" costs). – supercat Aug 28 '13 at 22:04

Tony the Pony · Answer 5 · 2011-06-01T07:00:16.327

2

Purely speculating here:

Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.

Also, as @Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.

edited Jun 01 '11 at 07:00

answered Jun 01 '11 at 06:45

Tony the Pony

40,327
71
187
281

3

@Jen: Ok, but what is the benefit of passing it as a ValueType ? – Homam Jun 01 '11 at 06:46
Uhm, a question: because T is a class, can this struct be passed as ValueType? I'm wondering, really... – Marco Jun 01 '11 at 06:47
@Homam: Avoiding the heap allocation overhead. – Tony the Pony Jun 01 '11 at 06:47
@Marco: Yes. `items` is a reference to an array of `T`s. (This array is heap-allocated) – Tony the Pony Jun 01 '11 at 06:48
1

I believe the struct will also be in the heap (not all structs go on the stack) – Simon Hewitt Jun 01 '11 at 06:51
@Jen: so you say that, because array is already heap-allocated, items contains only a reference and it won't need more heap allocation? OK, it seems reasonable!! +1 for you, thanks!! – Marco Jun 01 '11 at 06:52
1

Speculating again, but I think that the author of `ItemList` intended it to be included into other objects via composition rather than by reference. And @Simon, yes, it may very well go onto the heap at some point. Storage is ultimately the decision of the VM. – Tony the Pony Jun 01 '11 at 06:54
1

The authors are Microsoft - this is their Linq code. It is used via composition but it is marked internal and only in the EntitySet class. – Simon Hewitt Jun 01 '11 at 10:34

Why is this implemented as a struct?

5 Answers5