Is it efficient to use length() in loops?

Question

I have a lot of cases when I iterate over a vector in MATLAB,

so I do this :

len = length(vector)
for i=1:len
 do_something();
end

But this is just an intuition that says "prevent calling the length() function every time". Is it true? Or is it the same thing as this, in terms of time requirement :

for i=1:length(vector)
 do_something();
end

Thanks for any help!

I can, but I also want to learn about MATLAB's way of compiling the code. — jeff, May 15 '14 at 12:09
If you measure it, do try `numel` also. It may be faster than `length` — Luis Mendo, May 15 '14 at 12:14
It would be very appreciated if you take a simple test and compare the elapse-times of the suggestions made above, you could use the ´tic toc´ function from matlab. And then show us the results :). And I would go for the `numel` too. — Sergio Haram, May 15 '14 at 12:18
I doubt it would call it every time, I'm pretty sure the interpreter will first create a vector of 1:length(vector) and then only start the iterating process based on that vector. I can't imagine that you'd see any performance difference calling length before. It's not like you're calling it inside the loop itself, that might be different. — Dan, May 15 '14 at 12:54
@Dan that's what I want to learn. I also think so, but I want to know :) — jeff, May 15 '14 at 12:59
(@Dan I'm not sure about the interpreter creating the vector) I wonder if anything useful can be derived from this: The iterator can be given a larger number of steps than can be defined in a variable, for example (on my system) `crtl-c` whilst running `for a=1:inf;a+1;end;` yields *Warning: FOR loop index is too large. Truncating to 9223372036854775807.* however declaring `a=1:9223372036854775807` fails as **Maximum variable size allowed by the program is exceeded.** — RTL, May 15 '14 at 13:16
@Dan, it doesn't work as simply as that. See my answer for more detail. — Sam Roberts, May 15 '14 at 13:30
@RTL: 9223372036854775807 is `intmax('int64')` (and not `realmax()`). From that I think we can infer that Matlab's `for` loop implementation (for large values) internally uses a 64-bit `int` counter, probably to keep track of the number of iterations in the loop. Seems that you only get the warning if (B-A)/S in `for i = A:S:B` is large, not just if B is large. E.g. `for i = 1:1e24` gives that warning, but `for i = 1e25:1e25+5*eps(1e25)` and `for i = 1:1e24:1e25` do not. — Andrew Janke, May 16 '14 at 03:56
@AndrewJanke interesting, I really should have recognised that as `intmax('int64')`, and +1 for internal 64bit loop counter... that sounds likely, especially as from my interpretation of the rest of your comment it seems dependent the number of elements not the values — RTL, May 16 '14 at 16:12

Sam Roberts · Accepted Answer · 2014-05-15T13:45:10.390

If you're concerned about performance or speed - don't worry about it, really. It will not make any appreciable difference. @David gives some timings in his answer that back this up.

In terms of aesthetics, for what it's worth I would normally write

for i = 1:numel(v)
    doSomething(v(i))
end

I tend to prefer numel to length, as length gives you the length of longest array dimension, rather than the length of the first dimension. If you're always sure it's a vector they're the same, but I try to avoid that assumption.

Of course, if I needed access to numel(v) separately, then it would be worth extracting it to an intermediate variable vNum and writing for i = 1:vNum.

If you're interested in the semantics of the for loop indices, it's more complex than you think. When you write for i = 1:N, MATLAB does not always simply create the vector 1:N and then iterate through it. For example, you can write

for i = 1:realmax
    i
end

or

for i = 1:Inf
    i
end

or even

for i = Inf:-1:1
    i
end

and it will work (press Ctrl-C to escape). MATLAB doesn't create the loop indices as a vector in these case, and it couldn't, as it would be too large.

The parser has a more complex logic than that, with knowledge of several edge cases, and optimizations. Sometimes it will create the loop indices ahead of the loop, sometimes on the fly. I guarantee you will not be able to second-guess all the internals, unless you have access to the MATLAB source code.

Note also that the loop indices do not have to be a vector; if you supply an array, MATLAB loops through the columns. For example,

for i = magic(3)
    i
end

displays the columns of the array in turn. Similarly you can supply a cell array and it will iterate through the cells (NB the indices are the cells, not the elements within the cells).

For this reason, I would sometimes write

for vElement = v
    doSomething(vElement)
end

rather than the first pattern above using numel.

And of course, depending on the application, it may well be better to vectorize doSomething and call doSomething(v).

One last thing - we're talking so far about integer indices really, but remember that the colon operator can have any increment, such as pi:sqrt(2):10; and remember that the colon operator is not the same as linspace, and will not give you the same answers.

@excaza For arrays of built-in types, `numel(x)` is just the same as `prod(size(x))`. For arrays of user-defined classes, it is also this, unless you have overloaded `numel`. — Sam Roberts, May 15 '14 at 14:45
Oh, duh. I'm so used to using `numel` along a specific dimension that I forgot what it's actually returning. Whoops. — sco1, May 15 '14 at 14:57
@SamRoberts: One caveat: for user-defined classes, `numel(x)` (and `length(x)`) will ignore a user-defined `size(x)` method, so if you overload one, you effectively have to overload all the size-and-shape related functions to get consistent behavior. (I.e. `numel(x)` is more like `prod(builtin('size', x))`.) — Andrew Janke, May 16 '14 at 00:07
+1, and a hearty second for preferring `numel` to `length`. I've run in to a bunch of bugs in practice where `length` was used in something that eventually got passed a non-vector input. — Andrew Janke, May 16 '14 at 01:03
I thought that length also gives the longer dimension. Is it wrong ? — jeff, May 18 '14 at 11:46
@CengizFrostclaw, yes, `length` gives the size of the longest dimension of the array (unless the array is empty, in which case it gives zero). — Sam Roberts, May 18 '14 at 13:52

score 1 · Answer 2 · answered May 15 '14 at 12:32

1

I ran a test, first with JIT compilation on (I ran each test 3 times)

length(vector) 12.86 12.50 12.44    
N              13.00 12.52 12.55   
numel(vector)  12.83 12.55 12.56

and with JIT compilation off,

length(vector) 13.12 13.43 12.95   
N              12.54 13.04 13.00   
numel(vector)  12.57 12.92 12.72

I conclude that with JIT compilation on, it doesn't matter what you do, but without JIT then numel of N are best, but it doesn't seem to make a big difference. I can't explain this, and maybe this test is flawed, but there you go.

Here's the code:

clear
N=5e4;
A=rand(1e2);
tic()
vector=ones(1,N);
for i=1:length(vector)
    inv(A);
end
toc()
tic()
for i=1:N
    inv(A);
end
toc()
tic()
vector=ones(1,N);
for i=1:numel(vector)
    inv(A);
end
toc()

answered May 15 '14 at 12:32

David

8,449
1
22
32

A possible explanation: `numel` is more "basic" than `length` because `numel` directly measures the total bulk of memory used by the array; whereas `length` has to determine how that memory is organized into different dimensions, find the "extent" along each dimension, and then give the maximum of those "extents". – Luis Mendo May 15 '14 at 12:40
There is quite a bit of variability, so I'm not sure how conclusive the results are. It could just be a bad test. – David May 15 '14 at 12:43
1

Well, computing the inverse takes much more time than what you want to measure. So I suggest replacing that line by a very simple operation. That way differences (if any) will stand out more – Luis Mendo May 15 '14 at 12:47
Also, the definition of the vector should be outside tic-toc, so that it doesn't affect – Luis Mendo May 15 '14 at 12:50
1

And in addition to what Luis Mendo said I'd also run test much more than three times (with average of all results). So few trials are affected by environment (CPU usage, other programs, disk I/O, Windows updates, tic/toc resolution and accuracy...) – Adriano Repetti May 15 '14 at 12:50
2

@LuisMendo `numel` is implemented as though it were `prod(size(x))`, unless you have overloaded it for your own class. It isn't calculated via the memory used for the array. `length` also uses `size`, but returns the longest dimension of the array (basically `max(size(a))`, unless the array is empty, in which case it's always zero). Of course, they're all builtins, and don't directly call those MATLAB functions. – Sam Roberts May 15 '14 at 14:47
Based on Matlab's external interfaces doco and people's attempts to reverse-engineer the mxarray type, it looks like the number of elements and the size lists are each stored separately in the mxarray header. So `numel` probably reads nelements directly and is *slightly* simpler. But it doesn't matter – `numel`, `size`, `length`, `ndims` are all about as fast as a Matlab function can be, because those C-level operations are probably going to be swamped by the overhead of the M-code function call itself, even as a builtin. – Andrew Janke May 16 '14 at 00:46
@David: Matlab's statistical functions would let you test these results for significance to see if any conclusions could be drawn from them. – Andrew Janke May 16 '14 at 00:54

Andrew Janke · Answer 3 · 2014-05-16T04:01:31.330

They're the same. In the specific case of Matlab, don't worry about it: putting length() or any other function in the for initialization clause is always just as fast as evaluating it outside the loop, because for will only call it once either way. Your intuition is probably based on some other languages' for loops like C and Java, which have different behavior.

By definition, the Matlab for loop evaluates its argument expressions only once, at the beginning of the loop, to pre-compute a range or array of values for the loop index variable (i) to take on inside the loop passes. Unlike many other languages, Matlab's for does not re-evaluate some of the loop control statements each time through the loop. (This is also why assigning to the loop index variable inside the body of a Matlab for loop has no effect, where in C or Java it will allow you to "jump around" and alter the control flow.)

Have a read through the Matlab for documentation. It could stand to be more explicit about it, but you'll notice it's defined in terms of the values that the expressions resolve to, and not the expressions themselves.

Syntax equivalence

A C for loop is defined to have this behavior.

/* C-style for loop */
for ( A; B; C; ) {
    ...
}

/* basically equivalent to: */
{
    A;
    while ( B ) {
        ....
        C;
    }
}

Functionally, Matlab's for loop syntax equivalence is more like this.

% Matlab for loop

for i = A:B
    ...
end

% basically functionally equivalent to:

tmp_X = A:B;         % A and B only get evaluated outside the loop!
tmp_i = 1;           % tmp_* = an implicit variable you have no access to
while tmp_i < size(tmp_X,2)
    i = tmp_X(:,tmp_i);
    ...
    tmp_i = tmp_i + 1;
end

And in practice, Matlab can optimize away the creation of the concrete array tmp_X in the case of primitive values. This decoupling of the loop body from the control expressions also helps support the parallel parfor loop used with the Parallel Computing Toolbox, because the value of the loop index variable for every loop iteration is known before the loop starts, and independent of the execution of any of the loop passes.

Demonstration

You can confirm this behavior yourself by using a function that has an observable side effect in the loop control clause.

function show_for_behavior

for i = 1:three(NaN)
    disp(i);
end

function out = three(x)
disp('three() got called');
out = 3;

You can see there was only one invocation for the whole loop.

>> show_for_behavior
three() got called
     1
     2
     3

Language reasons

Here's where I speculate a bit.

Beyond convenience, I suspect one of the reasons Matlab defines its for loop the way it does, instead of providing you the C style syntactic sugar over a regular while loop, is that it's tricky to get the index variable right due to floating-point roundoff. By default, the numeric loop variables you're working with are doubles, and for large values of x (around 10^15), x + 1 == x, because the relative precision at x (eps(x)) is greater than 1.

So if you do the naive while-loop transformation of for i = A:B ... end like this, you'll have an infinite loop, because at each step, i = i + 1 will result in the same value of i due to rounding.

i = A;
while (i < B)
    ...
    i = i + 1;
end

To be able to perform loops over sequences of large values, you can compute the value range and number of steps, keep track of the loop index using a separate integer value, and construct the i value for each step using that counter and the step size, instead of incrementing a temporary variable on each pass. Something like this.

% original
for x = A:S:B; ...; end

% equivalent
nSteps = int64( ((B - A) / S) ) + int64(1);
i = int64(0);
while i < nSteps
    x = A + (S * double(i));
    ....
    i = i + int64(1);
end

You can only do this if the range min, max, and step are defined ahead of time for all passes, which isn't guaranteed with the more flexible while-loop form.

Note that in this case, for large A and B, x may have the exact same value for multiple iteration passes, but will eventually progress, and you'll get about as many loop iterations as you would expect if you were using infinite-precision values instead of approximate floating-point values. I suspect this is about what Matlab does internally in these cases.

Here's an example showing this behavior.

function big_for_loop(a)

if nargin < 1;  a = 1e20;  end
b = a + 4 * eps(a);
step = 15;

fprintf('b - a = %f\n', b - a);
fprintf('1 + (b - a) / step = %f\n', 1 + (b - a) / step);

last_i = a;
n = 0;
for i = a : step : b
    n = n + 1;
    if (i ~= last_i); disp('i advanced');  end
    last_i = i;
end
fprintf('niters = %d\n', n);

When I run this, i changes about as you would expect based on eps if this is how Matlab is doing the loop.

>> big_for_loop
b - a = 65536.000000
1 + (b - a) / step = 4370.066667
i advanced
i advanced
i advanced
i advanced
niters = 4370

Is it efficient to use length() in loops?

3 Answers3

Syntax equivalence

Demonstration

Language reasons