Return the maximum value of every subgroup in a table

Question

I have a section of table like this:

X     Y    Value
__   __    __   
1    2     6.9   
1    3     6.8   
1    4     8.1 
2    1     7.2 
2    3     11.7
2    4     16
3    1     22.6
3    2     20.5
3    3     18.1
… … …

For each group of rows having the same X, I need to select only the row having the largest Value. How can I generate a table like this?

X     Y    Value 
__   __    __   
1   4      8.1 
2   4      16
3   1      22.6

The code I have so far produces just one row:

X = [1; 1; 1; 2; 2; 2; 3; 3; 3];
Y = [2; 3; 4; 1; 3; 4; 1; 2; 4];
Value = [6.9; 6.8; 8.1;7.2;11.7;16;22.6;20.5;18.1];
T = table(X,Y,Value);
[~,maxidx] = max(Value);
T(maxidx,:)
%{
ans =
  1×3 table
    X    Y    Value
    _    _    _____
    3    1    22.6 
%}

Instead of having Y as a vector, maybe try to create a 2d matrix. Every row (or column if you like) would correspond to a value from X. Then you could try using max() on this matrix. — Aristotelis, Jun 02 '19 at 08:40
The [edit] feature is not meant to discuss issues you have with proposed solutions. Please use the comments for that, or ask a new question. — Dev-iL, Jun 02 '19 at 14:17

Dev-iL · Accepted Answer · 2019-06-03T07:15:59.517

3

If you're using R2015b or newer, you could use splitapply:

function T2 = q56413455()
% Define some example inputs:
X = [1; 1; 1; 2; 2; 2; 3; 3; 3];
Y = [2; 3; 4; 1; 3; 4; 1; 2; 4];
Value = [6.9; 6.8; 8.1;7.2;11.7;16;22.6;20.5;18.1];
T = table(X,Y,Value);
% Call the "business logic" and assign the output:
T2 = getMaxRows(T);

function out = getMaxRows(T)
GROUPING_VAR = 1; % We assume that the 1st column contains the grouping variable
varnames = T.Properties.VariableNames;
tmp = splitapply(@maxrow, T, T.(varnames{ GROUPING_VAR }));
out = array2table(tmp, 'VariableNames', varnames );

function outrow = maxrow(varargin)
COL_WITH_MAX = 3; % We assume that the 3rd columns is the one that matters for max()
subArr = cell2mat(varargin);
[~,I] = max(subArr(:,COL_WITH_MAX));
outrow = subArr(I,:);

Calling this produces the desired result:

>> q56413455
ans =
  3×3 table
    X    Y    Value
    _    _    _____
    1    4     8.1 
    2    4      16 
    3    1    22.6

Another variation uses the 2^nd output of splitapply(@max, ...) which is the index of the maximum within the group. We then need to add the amount of elements in previous groups to this (this is done using diff):

X = [1; 1; 1; 2; 2; 2; 3; 3; 3];
Y = [2; 3; 4; 1; 3; 4; 1; 2; 4];
Value = [6.9; 6.8; 8.1;7.2;11.7;16;22.6;20.5;18.1];
T = table(X,Y,Value);

% Get the position of the maximum Value in every group
[~,I] = splitapply(@max, T.Value, T.X); % I == [3; 3; 1]

% Get beginnings of every group
lastGroupEnd = find([1; diff(X)])-1; % lastGroupEnd == [0; 3; 6]

% Offset the maximum positions by group starts to get row indices in the original table
T2 = T(I + lastGroupEnd, :);

edited Jun 03 '19 at 07:15

answered Jun 02 '19 at 09:20

Dev-iL

23,742
7
57
99

Is it possible to do it without using the functions out and outrow?. Furthermore, I changed the splitapply line to out = splitapply(@maxrow, [T.X T.Y T.Value], T.X); and add above it T = table(X,Y,Value);. How do I store the ans as a new table? – jane Jun 02 '19 at 11:19
Something like this: [`T2 = array2table(out, 'VariableNames', T.Properties.VariableNames)`](https://www.mathworks.com/help/matlab/ref/array2table.html). – Dev-iL Jun 02 '19 at 12:20
Thanks. Add the line T2 = array2table(out, 'VariableNames',{'X','Y','Value'}) after the out =.. line. It does not store it in the workspace, any reason why? – jane Jun 02 '19 at 14:02
I don't know what exactly you're doing, so it's almost impossible to know. If you're using a function to do this, like in my example, and the output of the function is `out` (and not `T2`), you wouldn't get a different output unless you redefine `out` or set the output to be `T2`. If this doesn't solve your problem and you want to explain it in further detail, feel free to contact me in [chat](https://chat.stackoverflow.com/rooms/81987/). – Dev-iL Jun 02 '19 at 14:06
Thanks. I updated the code in the question. Maybe you can see what is wrong. – jane Jun 02 '19 at 14:16
@jane it is exactly as I said. You have the wrong output from the function. One way to solve it is to change `T2 = array2table ... ` to `out = array2table ...`. But more importantly, you should read about how [functions](https://www.mathworks.com/help/matlab/ref/function.html) work in MATLAB (inputs, outputs, etc.) because your last question indicates that you're struggling with some general concepts. – Dev-iL Jun 02 '19 at 14:20
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194335/discussion-between-jane-and-dev-il). – jane Jun 02 '19 at 14:33

Adam · Answer 2 · 2019-06-02T18:06:40.460

0

Use unique and cumsum

X = [1; 1; 1; 2; 2; 2; 3; 3; 3];
Y = [2; 3; 4; 1; 3; 4; 1; 2; 4];
values = [6.9; 6.8; 8.1;7.2;11.7;16;22.6;20.5;18.1];

T = table(X,Y,values);

% Find number of categorical X and corresponding category X 
[count,category]=hist(X,unique(X));

% Starting offset index of each category, X = 2, offset is 3, X = 3, offset is 6
location = cumsum(count);

maxidx = zeros(1,numel(category));

for i = 1:numel(category)
    [~,maxidx(i)] = max(T(T.X == category(i) , :).values);
if i == 1
    % First category, no need offset 
else
    % Locate max index in whole table by adding offset to the initial index
    maxidx(i) = maxidx(i) + location(i-1);
end
end
T(maxidx,:)
%{
ans =

  3×3 table

    X    Y    values
    _    _    ______

    1    4      8.1 
    2    4       16 
    3    1     22.6 
    %}

edited Jun 02 '19 at 18:06

answered Jun 02 '19 at 09:52

Adam

2,726
1
9
22

1

I though about this too, but such an approach fails if we have repeating values in different groups, where a maximum in one group also exists in some previous group, but it is not the maximum there (i.e. try this with `Value = [16;17;18;7.2;11.7;16;22.6;20.5;18.1];` and see what happens). This is exactly why I decided to return whole rows and not row indices. This solutions works for _most_ (but not **all**) inputs, which makes it dangerous and difficult to debug later on. – Dev-iL Jun 02 '19 at 14:27
@Dev-iL check this one, think it may work perfectly – Adam Jun 02 '19 at 17:40
The idea seems correct. Several comments/suggestions: 1) You appear to have re-implemented `splitapply` - if you want to see how this can be done using `splitapply`, check out my (now updated) answer. 2) You could get rid of the `if` if you'd precompute like this: `location = location-location(1);` (the offset of the first element will be 0). 3) `hist` is deprecated, one should use `histcounts` or `discretize` instead. – Dev-iL Jun 03 '19 at 07:28

Eliahu Aaron · Answer 3 · 2019-06-02T16:42:40.190

0

You can do it using a loop on unique X values:

X = [1; 1; 1; 2; 2; 2; 3; 3; 3];
Y = [2; 3; 4; 1; 3; 4; 1; 2; 4];
Value = [6.9; 6.8; 8.1;7.2;11.7;16;22.6;20.5;18.1];

uniqueX = unique(X); % Get 'X' unique values
maxidx = zeros(size(uniqueX));
for i = 1:length(uniqueX)
    xind = find(X == uniqueX(i)); % Find all indices of a unique 'X' value
    [~,vind] = max(Value(xind)); % Find index of max Value in 'xind' group
    maxidx(i) = xind(vind); % Get the index in the original group 
end

T(maxidx,:)

Output:

ans =
  3×3 table
    X    Y    Value
    _    _    _____
    1    4     8.1 
    2    4      16 
    3    1    22.6

edited Jun 02 '19 at 16:42

answered Jun 02 '19 at 16:32

Eliahu Aaron

4,103
5
27
37

Unfortunately you cannot see my comment on the now-deleted answer, so I'll write it again: using `find` is problematic because such an approach fails if we have repeating values in different groups, where the maximum of one group also exists in some previous group, but it is not the maximum there (i.e. try this with `Value = [16;17;18;7.2;11.7;16;22.6;20.5;18.1];` and see what happens). This is exactly why I decided to return whole rows and not row indices. This solutions works for most (but not all) inputs, which makes it dangerous and difficult to debug later on. – Dev-iL Jun 02 '19 at 16:35
@Dev-iL: Yes you are right, I removed `splitapply` from my answer. – Eliahu Aaron Jun 02 '19 at 16:44

Return the maximum value of every subgroup in a table

3 Answers3