19

If I want to store some strings or matrices of different sizes in a single variable, I can think of two options: I could make a struct array and have one of the fields hold the data,

structArray(structIndex).structField

or I could use a cell array,

cellArray{cellIndex}

but is there a general rule-of-thumb of when to use which data structure? I'd like to know if there are downsides to using one or the other in certain situations.

CJS
  • 193
  • 1
  • 1
  • 7

4 Answers4

17

In my opinion it's more a matter of convenience and code clarity. Ask yourself would you prefer to refer your variable elements by number(s) or by name. Then use cell array in former case and struct array in later. Think about it as if you have a table with and without headers.

By the way you can easily convert between structures and cells with CELL2STRUCT and STRUCT2CELL functions.

yuk
  • 19,098
  • 13
  • 68
  • 99
  • 4
    With cell arrays you need some meta data in order to identify cell content. Carefully chosen field names make your code self explaining. – zellus Sep 03 '10 at 16:55
10

If you use it for computation within a function, I suggest you use cell arrays, since they're more convenient to handle, thanks e.g. to CELLFUN.

However, if you use it to store data (and return output), it's better to return structures, since the field names are (should be) self-documenting, so you don't need to remember what information you had in column 7 of your cell array. Also, you can easily include a field 'help' in your structure where you can put some additional explanation of the fields, if necessary.

Structures are also useful for data storage since you can, if you want to update your code at a later date, replace them with objects without needing to change your code (at least in case you did pre-assignment of your structure). They have the same sytax, but objects will allow you to add more functionality, such as dependent properties (i.e. properties that are calculated on the fly based on other properties).

Finally, note that cells and structures add a few bytes of overhead to every field. Thus, if you want to use them to handle large amounts of data, you're much better off to use structures/cells containing arrays, rather than having large arrays of structures/cells where the fields/elements only contain scalars.

Jonas
  • 74,690
  • 10
  • 137
  • 177
6

This code suggests that cell arrays may be roughly twice as fast as structs for assignment and retrieval. I did not separate the two operations. One could easily modify the code to do that.

Running "whos" afterwards suggests that they use very similar amounts of memory.

My goal was to make a "list of lists" in python terminology. Perhaps an "array of arrays".

I hope this is interesting/useful!

%%%%%%%%%%%%%%  StructVsCell.m %%%%%%%%%%%%%%%

clear all

M = 100; % number of repetitions
N = 2^10; % size of cell array and struct


for m = 1:M
    % Fill up a template cell array with
    % lists of randomly sized matrices with
    % random elements.
    template{N} = 0;
    for n = 1:N
        r1 = round(24*rand());
        r2 = round(24*rand());
        r3 = rand(round(r2*rand),round(r1*rand()));
        template{N} = r3;
    end

    % Make a cell array equivalent
    % to the template.
    cell_array = template;

    % Create a struct with the
    % same data.
    structure = struct('data',0);
    for n = 1:N
        structure(n).data = template{n};
    end

    % Time cell array
    tic;
    for n = 1:N
        data = cell_array{n};
        cell_array{n} = data';
    end
    cell_time(m) = toc;

    % Time struct
    tic;
    for n = 1:N
        data = structure(n).data;
        structure(n).data = data';
    end
    struct_time(m) = toc;
end

str = sprintf('cell array: %0.4f',mean(cell_time));
disp(str);
str = sprintf('struct: %0.4f',mean(struct_time));
disp(str);
str = sprintf('struct_time / cell_time: %0.4f',mean(struct_time)/mean(cell_time));
disp(str);

% Check memory use
whos

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
abalter
  • 9,663
  • 17
  • 90
  • 145
  • Oh, and thanks @jonas for telling me about CELLFUN. I did not know about that function and will use it in the code I'm working on right now. – abalter Dec 06 '10 at 00:16
  • according to your code `struct` is four times faster in my case – embert Aug 21 '14 at 14:48
3

First and foremost, I second yuk's answer. Clarity is generally more important in the long run.

However, you may have two more options depending on how irregularly shaped your data is:

Option 3: structScalar.structField(fieldIndex)

Option 4: structScalar.structField{cellIndex}

Among the four, #3 has the least memory overhead for large numbers of elements (it minimizes the total number of matrices), and by large numbers I mean >100,000. If your code lends itself to vectorizing on structField, it is probably a performance win, too. If you can't collect each element of structField into a single matrix, option 4 has the notational benefits without the memory & performance advantages of option 3. Both of these options make it easier to use arrayfun or cellfun on the entire dataset, at the expense of requiring you to add or remove elements from each field individually. The choice depends on how you use your data, which brings us back to yuk's answer -- choose what makes for the clearest code.

Arthur Ward
  • 597
  • 3
  • 20