Initialize array of structs with unknown length fields

Question

Is there value in attempting to pre-allocate an array of structs when the size of the fields is variable? For example:

A.x = randn(1,randi(100));
A.y = randn(1,randi(100));

for k = 2:1000
    A(k).x = randn(1,randi(100));
    A(k).y = randn(1,randi(100));
end

I could create the first entry and then use repmat, but MATLAB would still have to deal with the unknown field lengths. In my tests there is little/no improvement compared to just letting it grow dynamically. Incidentally, growing it with brackets (e.g. A = [A nextEntry]) is much slower.

Is there a clever way to do a pre-alloc to speed this up?

Maybe this post will help: http://stackoverflow.com/questions/28664640/matlab-vectorization-filling-struct-fields-from-vector-elements — rayryeng, Jun 28 '16 at 21:32
You don't have to initialize the *value* of the fields just that there are fields. The values are stored elsewhere in memory. — Suever, Jun 28 '16 at 21:35

Suever · Answer 1 · 2016-06-28T21:48:35.313

The way that MATLAB stores struct arrays is that the meta-data about the struct (the dimensions, the fieldnames, etc.) is stored in one place in memory and the contents (values) of each field are stored separately and pointers to their location are inserted into the meta-data so that they can be located when requested.

For this reason, if you want to initialize a struct you can initialize it with all the contents set to []. You only need to ensure that the number of fields and the dimensions of the initial struct are the correct size so that we have enough space to store all of the pointers to the data that it will eventually contain.

Then you can fill in the fields as-needed, their value will be assigned to fresh memory, and their pointer will be stored in the meta-data in the pre-allocated location.

A relevant article from Loren's blog

So in your case you can simply pre-allocate your struct with:

A = struct('x', cell(1, 1000), 'y', cell(1, 1000));

And fill it with:

for k = 1:numel(A)
    A(k).x = randn(1, randi(100));
    A(k).y = randn(1, randi(100));
end

As far as why growing A using [A newA] is slower. This causes us to have to "grow" the meta-data component of the struct each time through the loop, which actually requires an entire copy of the meta-data to be made to perform the expansion each time.

Thanks for the reply. I guess part of my question is "why bother", when MATLAB will have to go find memory for the contents anyway. It can pre-allocate space for the pointers, but the pointer could be to a large array, which it needs to deal with in real time. Or do I have that wrong? — Mastiff, Jun 28 '16 at 23:10
@user2364295 The pointers (stored in the meta-data) are the same size regardless of the size of data they point to. If you don't store space for all of these pointers up-front, MATLAB would have to *move* them meta-data every time you add a new element (which is why you saw a performance hit when you did that). And yes, when you assign data it has to allocate the data for that item, but you don't want to add that to the need to re-allocate the entire meta-data structure. You don't save any time up-front pre-allocating the items. — Suever, Jun 28 '16 at 23:11

Initialize array of structs with unknown length fields

1 Answers1