Matlab `rowfun` function with multiple outputs: Safe to assume row order?

Question

I tried providing a function to rowfun that returns multiple-row output, of the same height as the input. It seems to work as expected.

% Example table with 2-column-array as a single data field
x = table( [1;1;2;2] , [[2;2;1;1] [2;1;2;1]] , ...
           'VariableNames' , {'idx' 'Field2columns'} )

  x = idx    Field2columns
      ___    _____________
      1      2    2       
      1      2    1       
      2      1    2       
      2      1    1       

% Example anonymous function takes all rows with same idx value and
% reverse their row order
y = rowfun( @(z) z(end:-1:1,:) , x , 'Input','Field2columns' , ...
            'Grouping','idx' , 'OutputVar','OutVar' )

  y =        idx    GroupCount    OutVar
             ___    __________    ______
      1      1      2             2    1
      1_1    1      2             2    2
      2      2      2             1    1
      2_1    2      2             1    2

% Append the generated data to original table
[ x y(:,{'OutVar'}) ]

  ans =      idx    Field2columns    OutVar
             ___    _____________    ______
      1      1      2    2           2    1
      1_1    1      2    1           2    2
      2      2      1    2           1    1
      2_1    2      1    1           1    2

This makes for very efficient code. I would otherwise have to loop through all distinct values of x.idx, extract matching rows of x for each value, generate row-reversed subset and compile the results.

My only concern is that I am assuming that the row order of the output from the anonymous function will be maintained, and that each row will align with the corresponding row in table x. For example, if idx=7, then the Nth row in x for which idx=7 will be appended on the right with Nth row in anonymous function output when it is applied to x(x.idx==7,:).

The rowfun documentation doesn't deal with cases in which the first argument represents a function that returns a multi-row output. I have only the observed behaviour to rely on. Would it be advisable to exploit this behaviour to streamline my code, or is it bad practice to rely on such undocumented behaviour, e.g., corner cases may not be covered, and there is no obligation for TMW to maintain current behaviour in the future?

score 1 · Answer 1 · answered Jan 12 '19 at 15:18

1

The documentation for rowfun, under 'GroupingVariables' says:

The output, B, contains one row for each group.

So if you get more than one row per group, you are definitely treading in undocumented waters. A future version could throw an error with your code.

Regarding the order of the input rows to your function: I would suggest you ask MathWorks about the order of the rows with the same grouping variables. One way would be to go to the bottom of the documentation page, select a star rating, then in the text box say that the documentation isn’t complete because it doesn’t specify the order of the rows when this option is given. The documentation folk like the docs being thorough and complete, they might answer this question by completing the documentation.

answered Jan 12 '19 at 15:18

Cris Luengo

55,762
10
62
120

I did submit a suggestion to improve the documentation, referring to this thread as an example. I believe that the issue is more complex than the ordering of input rows. Once the 1st `rowfun` argument is applied to each group, there's also the question of how those rows are ordered. Indeed, the resulting rows for any one group don't even have to be lumped together. – user36800 Jan 14 '19 at 02:26
Even in the documented case of one row output per group, the output row order is not guaranteed, but at least the grouping variable(s) serve as a key. For the case of multi-row output per group, the grouping variable takes on the same value for all the rows, so you can't use it to get to the right row. – user36800 Jan 14 '19 at 02:29
Oh, and thanks for weighing in. I don't feel wastefully and overly paranoid about protecting my code from behaviour that is not guaranteed. – user36800 Jan 14 '19 at 02:49

score 1 · Accepted Answer · answered Jan 12 '19 at 19:56

1

If you want to stay in the documented zone, you can use the very handy splitapply for that. To deal with the multiple rows in the output you can put them in a cell, and then convert it to a table:

y = splitapply(@(z) {z(end:-1:1,:)},x.Field2columns,x.idx) % note the {...} in the function
[x table(cell2mat(y),'VariableNames',{'OutVar'})] % this is like: [x y(:,{'OutVar'})]

I guess this will be less efficient, but it keeps your code within the documented behaviour of the functions, without a need to use loops.

answered Jan 12 '19 at 19:56

EBH

10,350
3
34
59

Thanks, EBH. That looks like it could work. I can use the same cell wrapping strategy with `rowfun` (which I favour over `split/appy` for its one-statement brevity). I usually use the notation for including the grouping variable in the output just because I don't like having to assume that the output will have the same ordering as the grouping variable input argument. – user36800 Jan 14 '19 at 02:47
Actually, it's messier than I thought. If you don't assume anything about the row ordering of the input table `x`, then rows for (say) `idx==1` don't even have to be contiguous. I would still have cycle through each `idx` value, identifying the matching rows each time, and applying the anonymous function. To avoid this problem, I could sort the rows of `x` by `idx` beforehand, and also sort the rows of `y`by `idx` as well. But that erodes the advantage of split-apply/rowfun over just looping over `idx` values. I'm beginning to think that I'm trying to cram too much into one statement. – user36800 Jan 14 '19 at 05:13

Matlab `rowfun` function with multiple outputs: Safe to assume row order?

2 Answers2