56

Given a matrix m = [10i+j for i=1:3, j=1:4], I can iterate over its rows by slicing the matrix:

for i=1:size(m,1)
    print(m[i,:])
end

Is this the only possibility? Is it the recommended way?

And what about comprehensions? Is slicing the only possibility to iterate over the rows of a matrix?

[ sum(m[i,:]) for i=1:size(m,1) ]
Nico
  • 1,070
  • 1
  • 10
  • 14
  • 5
    mapslices? `mapslices(sum, m, 2)` does the latter – jverzani Feb 14 '14 at 19:25
  • @jverzani mapslices does the job, although in some cases it will require I define an anonymous function. Thanks for the suggestions. – Nico Feb 14 '14 at 21:29
  • For any new readers, make sure you check out the answer by Seanny123, as it contains a good solution for v1.1+ that was not originally available when this question was asked and answered. – Colin T Bowers Oct 26 '22 at 10:37

4 Answers4

68

The solution you listed yourself, as well as mapslices, both work fine. But if by "recommended" what you really mean is "high-performance", then the best answer is: don't iterate over rows.

The problem is that since arrays are stored in column-major order, for anything other than a small matrix you'll end up with a poor cache hit ratio if you traverse the array in row-major order.

As pointed out in an excellent blog post, if you want to sum over rows, your best bet is to do something like this:

msum = zeros(eltype(m), size(m, 1))
for j = 1:size(m,2)
    for i = 1:size(m,1)
        msum[i] += m[i,j]
    end
end

We traverse both m and msum in their native storage order, so each time we load a cache line we use all the values, yielding a cache hit ratio of 1. You might naively think it's better to traverse it in row-major order and accumulate the result to a tmp variable, but on any modern machine the cache miss is much more expensive than the msum[i] lookup.

Many of Julia's internal algorithms that take a dims keyword, like sum(m; dims=2), handle this for you.

tholy
  • 11,882
  • 1
  • 29
  • 42
  • 1
    I think this answers my question, I will wait another day to accept the answer. I like this answer very much because it's made me realise that since Julia is column-major, I better arrange my data vectors as columns rather than rows. – Nico Feb 15 '14 at 16:27
  • 1
    The blog post you've linked to no longer exists. See http://docs.julialang.org/en/release-0.4/manual/performance-tips/#access-arrays-in-memory-order-along-columns instead. – aventurin May 09 '16 at 18:42
  • 3
    404 seems due to trailing slash. This URL works: http://julialang.org/blog/2013/09/fast-numeric – Isaiah Norton May 25 '16 at 20:03
  • 1
    That was correct. but what about a 3-dimensional array? e.g: `A[i,j,k]` . What is the order? `k --> j --> i` or `i --> j --> k` ? – Alireza Ghavaminia May 26 '18 at 03:27
  • Dimensions are ordered fastest-to-slowest. – tholy Jun 26 '22 at 10:50
26

As of Julia 1.1, there are iterator utilities for iterating over the columns or rows of a matrix. To iterate over rows:

M = [1 2 3; 4 5 6; 7 8 9]

for row in eachrow(af)
    println(row)
end

Will output:

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
Seanny123
  • 8,776
  • 13
  • 68
  • 124
  • Is there any way to also get the row indices using this method? That is, not just each row itself, but also its index. – Skumin Mar 13 '20 at 16:48
  • 1
    Skumin, you can use `for (i, row) in enumerate(eachrow(M))`. In fact, you can apply `enumerate` to any iterator – Reiner Martin Mar 02 '21 at 07:55
  • 1
    What are the performance implications of using `eachrow(df)`? Is it comparable to a naive loop? – zeawoas Sep 03 '21 at 10:09
4

According to my experiences, explicit iterations are much faster than comprehensions.

And iterating over columns are also a good advice.

Besides, you can use the new macros @simd and @inbounds to further accelerate it.

Sisyphuss
  • 101
  • 1
  • 1
0

In my case, I could not use the eachrow iterator, or nested loops, as I needed to zip eachindex with something else, and iterate over that zip iterator. Hence, I wrote:

ncols = size(m, 2)
for i in eachindex(m)
    rowi, coli = fldmod1(i, ncols)
    elem = m[rowi, coli]
end

Note that this will only work where eachindex returns linear indexing. If eachindex returns an iterator of Cartesian coordinates, you may need to iterate over 1:prod(size(m)) instead.

Jake Ireland
  • 543
  • 4
  • 11