Performance of array of struct types

Question

Example:

// Potentially large struct.
struct Foo
{ 
   public int A;
   public int B;
   // etc.
}

Foo[] arr = new Foo[100];

If Foo is a 100 byte structure, how many bytes will be copied in memory during execution of the following statement:

int x = arr[0].A

That is, is arr[0] evaluated to some temporary variable (a 100 byte copy of an instance of Foo), followed by the copying of .A into variable x (a 4 byte copy).

Or is some combination of the compiler, JITer and CLR able to optimise this statement such that the 4 bytes of A are copied directly into x.

If an optimisation is performed, does it still hold when the items are held in a List<Foo> or when an array is passed as an IList<Foo> or an ArraySegment<Foo>?

Maybe your confusion will be gone if you realize that Array itself is a class (not struct). — Evk, Apr 28 '17 at 17:16
I have clarified the question, it is regarding the memory access bandwidth of arr[0].A — redcalx, Apr 28 '17 at 17:29
The question is still confusing. `arr[0].A` is a *variable*. That variable consumes four bytes; it's an integer variable, so what else could it consume? I don't know what this thing called "memory access" that you are referring to is. Are you asking how many bytes are read when you read from a four byte variable? Four. How many bytes are written when you write to a four-byte variable? Four. — Eric Lippert, Apr 28 '17 at 17:32
OK so my Q is: are there circumstances where arr[0] is evaluated first causing a 100 byte read, followed by .A, a further 4 byte read. E.g. consider ArraySegment. — redcalx, Apr 28 '17 at 17:35
@EricLippert To me the question makes sense though. Because although both are called indexers, `array[0].A` and `list[0].A` have very different behavior for `struct`s, don't you agree? — Ivan Stoev, Apr 28 '17 at 17:47
@IvanStoev: `list[0].A` is not a variable; is that the difference you mean? If that's the OP's question then they should more clearly *ask that question*. A code sample that clearly shows what operations they're interested in would help. — Eric Lippert, Apr 28 '17 at 17:54
@EricLippert I guess the question is for me :) Indeed, that's what I meant - for any other indexer property (prior C#7.0 of course) except the `Array`, `[0].A` will cause copy the whole struct and then access the member. — Ivan Stoev, Apr 28 '17 at 17:58
@EricLippert I think part of the issue here is that I wasn't aware of the deficiencies in my Q until I read the various criticisms of it; from which I have learned and attempted to improve/clarify the Q. I.e. my 'discussion and discovery' approach of using this site is at odds with how you're expecting the site to be used. In any case, I thank you for your answer as it did mostly cover my intended Q. — redcalx, Apr 28 '17 at 19:32
@IvanStoev I'm intrigued by your mention of C#7.0; is there a behaviour change in version 7? — redcalx, Apr 28 '17 at 20:13
@redcalx C# 7.0 added ref returns, which would let you write a custom object that could return a variable, rather than a value. — Servy, Apr 28 '17 at 20:29
@redcalx The behavior is the same. What I had in mind was that with the new [ref locals and returns](https://learn.microsoft.com/en-us/dotnet/articles/csharp/whats-new/csharp-7#ref-locals-and-returns) feature one can make own collection types with the same indexer semantics as the arrays, which was not possible before. But I'm afraid standard BCL collections/interfaces will not utilize that possibility due to backward compatibility policy. — Ivan Stoev, Apr 28 '17 at 20:31

score 10 · Accepted Answer · edited Oct 24 '17 at 08:32

Value types are copied by value -- hence the name. So then we must consider at what times a copy must be made of a value. This comes down to analyzing correctly when a particular entity refers to a variable, or a value. If it refers to a value then that value was copied from somewhere. If it refers to a variable then its just a variable, and can be treated like any other variable.

Suppose we have

struct Foo { public int A; public int B; }

Ignore for the moment the design flaws here; public fields are a bad code smell, as are mutable structs.

If you say

Foo f = new Foo();

what happens? The spec says:

A new eight byte variable f is created.
A temporary eight byte storage location temp is created.
temp is filled in with eight bytes of zeros.
temp is copied to f.

But that is not what actually happens; the compiler and runtime are smart enough to notice that there is no observable difference between the required workflow and the workflow "create f and fill it with zeros", so that happens. This is a copy elision optimization.

EXERCISE: devise a program in which the compiler cannot copy-elide, and the output makes it clear that the compiler does not perform a copy elision when initializing a variable of struct type.

Now if you say

f.A = 123;

then f is evaluated to produce a variable -- not a value -- and then from that A is evaluated to produce a variable, and four bytes are written to that variable.

If you say

int x = f.A;

then f is evaluated as a variable, A is evaluated as a variable, and the value of A is written to x.

If you say

Foo[] fs = new Foo[1];

then variable fs is allocated, the array is allocated and initialized with zeros, and the reference to the array is copied to fs. When you say

fs[0].A = 123;

Same as before. f[0] is evaluated as a variable, so A is a variable, so 123 is copied to that variable.

When you say

int x = fs[0].A;

same as before: we evaluate fs[0] as a variable, fetch from that variable the value of A, and copy it.

But if you say

List<Foo> list = new List<Foo>();
list.Add(new Foo());
list[0].A = 123;

then you will get a compiler error, because list[0] is a value, not a variable. You can't change it.

If you say

int x = list[0].A;

then list[0] is evaluated as a value -- a copy of the value stored in the list is made -- and then a copy of A is made in x. So there is an extra copy here.

EXERCISE: Write a program that illustrates that list[0] is a copy of the value stored in the list.

It is for this reason that you should (1) not make big structs, and (2) make them immutable. Structs get copied by value, which can be expensive, and values are not variables, so it is hard to mutate them.

What makes array indexer return a variable but list indexer not? Is array treated in a special way?

Yes. Arrays are very special types that are built deeply into the runtime and have been since version 1.

The key feature here is that an array indexer logically produces an alias to the variable contained in the array; that alias can then be used as the variable itself.

All other indexers are actually pairs of get/set methods, where the get returns a value, not a variable.

Can I create my own class to behave the same as array in this regard

Before C# 7, not in C#. You could do it in IL, but of course then C# wouldn't know what to do with the returned alias.

C# 7 adds the ability for methods to return aliases to variables: ref returns. Remember, ref (and out) parameters take variables as their operands and cause the callee to have an alias to that variable. C# 7 adds the ability to do this to locals and returns as well.

What makes array indexer return a variable but list indexer not? Is array treated in a special way? Can I create my own class to behave the same as array in this regard? — Evk, Apr 28 '17 at 19:16
And is an array accessed via a ref of type IList also yield a variable? — redcalx, Apr 28 '17 at 19:35
@redcalx Write up the code and try to compile it to find out. — Servy, Apr 28 '17 at 19:37
@Servy I am doing, but I would also like to have a full understanding of the intent and design to distinguish between what is guaranteed to happen, and what may be the consequence of an optional optimisation. I think Eric's answer mostly covers it though. — redcalx, Apr 28 '17 at 19:41
@redcalx If a variable is returned, then the code works if you try to write to it, if a variable *isn't* returned, then the code can't compile. There's no optional optimizations at play. Either it lets you write to it or it doesn't. Optimizations can only improve speed without changing functionality, not change behavior. — Servy, Apr 28 '17 at 19:43

Joel Coehoorn · Answer 2 · 2017-04-28T17:56:50.687

2

The entire struct is already in memory. When you access arr[0].A, you aren't copying anything, and no new memory is needed. You're looking up an object reference (that might be on the call stack, but a struct might be wrapped by a reference type on the heap, too) for the location of arr[0], adjusting for the offset for the A property, and then accessing only that integer. There will not be a need to read the full struct just to get A.

Neither List<Foo> or ArraySegment<Foo> really changes anything important here so far.

However, if you were to pass arr[0] to a function or assign it to a new variable, that would result in copying the Foo object. This is one difference between a struct (value type) and a class (reference type) in .Net; a class would only copy the reference, and List<Foo> and ArraySegment<Foo> are both reference types.

In .Net, especially as a newcomer the platform, you should strongly prefer class over struct most of the time, and it's not just about the copying the full object vs copying the reference. There are some other subtle semantic differences that even I admittedly don't fully understand. Just remember that class > struct until you have a good empirical reason to change your mind.

edited Apr 28 '17 at 17:56

answered Apr 28 '17 at 17:14

Joel Coehoorn

399,467
113
570
794

Hi, I've clarified the question. – redcalx Apr 28 '17 at 17:26
And I've clarified the answer. – Joel Coehoorn Apr 28 '17 at 17:57
2

"Neither List or ArraySegment really changes anything important here so far." - my reading of Eric's answer is that he's saying that this is not the case; care to comment? (I would be interested in your interpretation/view of his answer). – redcalx Apr 28 '17 at 19:39
No, they are the same, because in his words you're still working with variables rather than values. – Joel Coehoorn Apr 28 '17 at 19:44
1

Hmm, the compiler disagrees. `arr[0].A = 123` compiles because `arr[0]` is a variable, whereas list[0] is not a variable, it yields a value, i.e. a copy of the element at [0]. Thus changing that copy would not change the value in the list, hence the compiler detects it as having no effect and thus flags it as an error. This is also the case for an array pointed to be a ref of type IList or an ArraySegment – redcalx Apr 28 '17 at 20:09
I think you'll find `list[0].A = 123` does change the value in the list, and if it's not compiling take a closer look at the compiler error. – Joel Coehoorn Apr 28 '17 at 20:14
2

I have tried it and it doesn't compile, for the reasons Eric described. "Cannot modify the return value of 'List.this[int]' because it is not a variable" – redcalx Apr 28 '17 at 20:17

Performance of array of struct types

2 Answers2

Linked