3

I don't know how to phrase this, but an example:

x = [1 4 4 5 5 5];
y = [5 1 3 3 4 5];

and then I'd like output

xs          = [1 4 5];
ys          = [5 2 4];
frequencies = [1 2 3]

(because the average 'y' at x=1 is 5, and the average 'y' at x=4 is (1+3)/2 = 2, and the average 'y' at x=5 is (3+4+5)/3 = 4).

I can compute this in a clumsy way but maybe there's a nice solution.

Frank Meulenaar
  • 1,207
  • 3
  • 13
  • 23
  • Don't have time to try it now, but perhaps the second answer to this question can be a good starting point: http://stackoverflow.com/questions/2880933/how-can-i-count-the-number-of-elements-of-a-given-value-in-a-matrix As you probably know the clumsy but simple way to do this would be to loop over `unique(x)`. – Dennis Jaheruddin Mar 18 '13 at 15:04

7 Answers7

4

You can use the histogramming function histc to get each of the categories:

x = [ 1 4 4 5 5 5];
y = [ 5 1 3 3 4 5];
xs = unique(x);
[frequencies xb] = histc(x, xs); % counts the number of each unique occurrence
ysp = sparse(1:numel(x), xb, y); % a sparse matrix is a good way to organize the numbers
ys = full(sum(ysp)./sum(ysp>0)); % each column in the matrix corresponds to a "index"

This gives you the three arrays you wanted. I think this is quite clean and efficient - no looping, only four lines of code.

Floris
  • 45,857
  • 6
  • 70
  • 122
3
x = [1 4 4 5 5 5];
y = [5 1 3 3 4 5];
xs = unique(x);
[frequencies,bin] = histc(x,xs);
ys = arrayfun(@(i) mean(y(bin==i)), 1:length(xs));
ioums
  • 1,367
  • 14
  • 20
  • +1: expression for `xs` can also be estimated by a running index for example `ys = arrayfun(@(k) mean(y(x==xs(k))), 1:length(xs));` – gevang Mar 18 '13 at 15:37
  • 3
    One thing to keep in mind: `arrayfun` may be neat, but it is usually slower than a loop. – Eitan T Mar 18 '13 at 15:39
2

@ioum 's answer worked great for me, there was a small mistake though in the last line, that came up when I gave as an input other vectors than the ones posted here. For example, after deleting the last element of each vector the answer should be:

ys = [5 2 3.5]

The slightly corrected code is:

x = [1 4 4 5 5 5];
y = [5 1 3 3 4 5];
xs = unique(x);
[frequencies,bin] = histc(x,xs);
ys = arrayfun(@(i) mean(y(bin==i)), 1:length(xs));

I tried to edit @ioum 's post, but the edit did not go through.

George
  • 774
  • 2
  • 9
  • 18
  • 1
    You are correct. I'm not sure why your edit would be rejected, but I have now made the edit myself. – ioums May 26 '14 at 17:18
  • Awesome, thanks (for both correcting and for saving me a lot of time in the first place)! – George May 26 '14 at 23:51
0

I'm not sure if this solution would be considered elegant enough, but this should work:

x = [1 4 4 5 5 5];
y = [5 1 3 3 4 5];
[xs,I,J] = unique(x);    %The value of the index vector I is not required here.
ys = zeros(size(xs));
frequencies = zeros(size(xs));
for i = 1:max(J)
    I = find(J==i);
    ys(i) = mean(y(I));
    frequencies(i) = length(I);
end
xs,ys,frequencies

The output would be:

xs =

     1     4     5


ys =

     5     2     4


frequencies =

     1     2     3

I hope this helps.

Roney Michael
  • 3,964
  • 5
  • 30
  • 45
  • 1
    Try not to use I or J as variables – HCAI Mar 18 '13 at 15:26
  • @user1134241: Sorry, but why? Is that a problem? – Roney Michael Mar 18 '13 at 16:04
  • This is exactly as I did it but didn't find it really elegant :) – Frank Meulenaar Mar 18 '13 at 16:31
  • @FrankMeulenaar: I wasn't sure it would be. :) At least the answer rustled up some activity on the question. :D Anyway, I hope you found what you were looking for in some other answer here. – Roney Michael Mar 18 '13 at 17:08
  • 2
    The problem with using `i` and `j` is that they are `sqrt(-1)` until you overwrite them. Other code that expects them to have their built in value will subsequently fail. They are 'not quite reserved" names... Matlab is full of them. – Floris Mar 18 '13 at 19:22
0
x = [1 4 4 5 5 5]';
y = [5 1 3 3 4 5]';

%this can probably be done smarter...

indexlong=accumarray(x,x,[],@mean)'
meanlong=accumarray(x,y,[],@mean)'
frequencieslong=accumarray(x,1)'

%leave out zeros

takethese=(indexlong>0);
xs=indexlong(takethese)
ys=meanlong(takethese)
frequencies=frequencieslong(takethese)
0

here is my code, hope it helps...

   x=sort(x);
   ind=1;
    for i=1:length(x)
        if (i>1 && x(i)==x(i-1))
           continue;
        end
        xs(ind)=x(i);
        freq(ind)=sum((x==x(i)));
        ys(ind)=sum((x==x(i)).*y)/freq(ind);
        ind=ind+1;
    end
Muhammad
  • 89
  • 1
  • 5
0

Though I would recommend one of the histogram approaches, here is how I would do it in a loop. Not that much different from some other solutions but I believe it is just a little nicer so I will post it anyway.

xs = unique(x)
for t = 1:length(xs)
   idx = x == xs(t);
   ys(t) = mean(y(idx));
   frequencies(t) = sum(idx);
end
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122