14

MATLAB is well-known for being column-major. Consequently, manipulating entries of an array that are in the same column is faster than manipulating entries that are on the same row.

In that case, why do so many built-in functions, such as linspace and logspace, output row vectors rather than column vectors? This seems to me like a de-optimization...

What, if any, is the rationale behind this design decision?

jub0bs
  • 60,866
  • 25
  • 183
  • 186
  • 1
    That's a very good question! My hunch would probably be to support legacy behaviour. Perhaps older versions of MATLAB had it as row vectors initially, and are just keeping that shape to preserve legacy behaviour.... but that's really just a guess. I'm curious to know the answer to this question myself. – rayryeng Dec 15 '14 at 19:30
  • 2
    Because those outputs are 1D (the first dimension is singleton)? It doesn't matter really, but I'd guess because it is easier to inspect the output on the command line with row vectors. – chappjc Dec 15 '14 at 19:31
  • @chappjc I personally prefer columns in my output. I can't bear those "Columns 1 to ..." headings in the Command Window... – jub0bs Dec 15 '14 at 19:33
  • It is more compact that way and easier to read for small vectors, true. But for a long row vector, at least it will wrap lines. – chappjc Dec 15 '14 at 19:34
  • @rayryeng But the first version of MATLAB was written in FORTRAN, which is itself column-major... – jub0bs Dec 15 '14 at 19:34
  • I never talked about the first version. I talked about **older** versions. Either way, I think I'll agree with chappjc in that it's simply for easier readouts. – rayryeng Dec 15 '14 at 19:36
  • @rayryeng But then, why would other functions, such as `diag` (when applied to a matrix), return a column vector? Why this inconsistency? – jub0bs Dec 15 '14 at 19:39
  • You're asking the wrong dude. Sorry! Maybe contact MathWorks? – rayryeng Dec 15 '14 at 19:49

1 Answers1

8

It is a good question. Here are some ideas...

My first thought was that in terms of performance and contiguous memory, it doesn't make a difference if it's a row or a column -- they are both contiguous in memory. For a multidimensional (>1D) array, it is correct that it is more efficient to index a whole column of the array (e.g. v(:,2)) rather than a row (e.g. v(2,:)) or other dimension because in the row (non-column) case it is not accessing elements that are contiguous in memory. However, for a row vector that is 1-by-N, the elements are contiguous because there is only one row, so it doesn't make a difference.

Second, it is simply easier to display row vectors in the Command Window, especially since it wraps the rows of long arrays. With a long column vector, you will be forced to scroll for much shorter arrays.

More thoughts...

Perhaps row vector output from linspace and logspace is just to be consistent with the fact that colon (essentially a tool for creating linearly spaced elements) makes a row:

>> 0:2:16
ans =
     0     2     4     6     8    10    12    14    16

The choice was made at the beginning of time and that was that (maybe?).

Also, the convention for loop variables could be important. A row is necessary to define multiple iterations:

>> for k=1:5, k, end
k =
     1
k =
     2
k =
     3
k =
     4
k =
     5

A column will be a single iteration with a non-scalar loop variable:

>> for k=(1:5)', k, end
k =
     1
     2
     3
     4
     5

And maybe the outputs of linspace and logspace are commonly looped over. Maybe? :)

But, why loop over a row vector anyway? Well, as I say in my comments, it's not that a row vector is used for loops, it's that it loops through the columns of the loop expression. Meaning, with for v=M where M is a 2-by-3 matrix, there are 3 iterations, where v is a 2 element column vector in each iteration. This is actually a good design if you consider that this involves slicing the loop expression into columns (i.e. chunks of contiguous memory!).

chappjc
  • 30,359
  • 6
  • 75
  • 132
  • Your question brings more questions than answers `:)` (+1 anyway). Why did the MathWorks decided that a row vector should be used for for loops? Wouldn't it have made more sense to use a column vector? Same question about the colon operator. – jub0bs Dec 15 '14 at 19:55
  • @Judobs It sure does! As a developer, I think it is actually fairly likely that some dev made an arbitrary decision in the dearly days of the product and they got locked in. Still, that doesn't mean the `colon` convention has any thing to do with the `linspace` convention, but it's very likely considering that's what `colon` does. – chappjc Dec 15 '14 at 19:57
  • 2
    @Jubobs Actually, it's not that a row vector is used for loops, it's that it loops through the columns of the loop expression. Meaning, with `for v=M` where `M` is a 2-by-3 matrix, there are 3 iterations, where `v` is a 2 element column vector in each iteration. This is actually a good design if you consider that this involves slicing the loop expression into columns (i.e. chunks of contiguous memory!). – chappjc Dec 15 '14 at 20:23
  • Yes, that actually makes sense. – jub0bs Dec 15 '14 at 21:06
  • 2
    I liked your first comment to the OP above about both cases being vectors (the elements of row and column vectors are both stored contiguously) and rows being easier for Command Window display –you might reiterate those in your answer. – horchler Dec 15 '14 at 22:30
  • @chappjc *My first thought was that in terms of performance and contiguous memory.* According to the blog post I link to in my question, there *is* a difference in performance; see the test involving two loops towards the end. Unfortunately, I haven't had the chance to rerun the benchmark and see the results with my own eyes, but I will as soon as possible. – jub0bs Dec 15 '14 at 23:12
  • @chappjc You had my +1 since your comment :-) – Luis Mendo Dec 15 '14 at 23:13
  • 2
    @Jubobs That blog post is demonstrating slicing multidimensional arrays. It is absolutely correct that it is more efficient to index a whole column of a 2D array (e.g. `v(:,2)`) rather than a row (e.g. `v(2,:)`) because in the row case it is not accessing elements that are contiguous in memory. However, for a row vector that is 1-by-N, the elements are contiguous because there is only one row. That is why it doesn't matter. – chappjc Dec 15 '14 at 23:16
  • @chappjc Ok; again, now that you've spelled it out, that makes sense. Your answer remains speculative, but it's good enough for me :) – jub0bs Dec 15 '14 at 23:18
  • @Jubobs In MATLAB, the underlying data buffers have no padding - every element is in a contiguous chunk of memory. I'm not sure where to point in the docs, but using MEX extensively, you know this because you have to access this buffer directly. See the docs for [`mxGetPr`](http://www.mathworks.com/help/matlab/apiref/mxgetpr.html) where it says "Once you have the starting address, you can access any other element in the mxArray". Since there is no notion of padding or stride that is not equal to the number of rows, a row vector must have contiguous data. – chappjc Dec 15 '14 at 23:28
  • @chappjc Ok, thanks. I think my question actually mainly stems from a misunderstanding of precisely this. Thanks again! – jub0bs Dec 15 '14 at 23:30