1

Working with Matlab 2019b.

x = [10 10 10 20 20 30]';

How do I get a cumulative count of unique elements in x, which should look like:

y = [1 2 3 1 2 1]';

EDIT:

My real array is actually much longer than the example given above. Below are the methods I tested:

x = randi([1 100], 100000, 1);
x = sort(x);

% method 1: check neighboring values in one loop
tic
y = ones(size(x));
for ii = 2:length(x)
    if x(ii) == x(ii-1)
        y(ii) = y(ii-1) + 1;
    end
end
toc

% method 2 (Wolfie): count occurrence of unique values explicitly
tic
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
    idx = (x == u(ii));
    y(idx) = 1:nnz(idx);
end
toc

% method 3 (Luis Mendo): triangular matrix
tic
y = sum(triu(x==x'))';
toc

Results:

Method 1: Elapsed time is 0.016847 seconds.
Method 2: Elapsed time is 0.037124 seconds.
Method 3: Elapsed time is 10.350002 seconds.
data-monkey
  • 1,535
  • 3
  • 15
  • 24

4 Answers4

2

EDIT:
Assuming that x is sorted:

x = [10 10 10 20 20 30].';
x = sort(x);

d = [1 ;diff(x)];
f = find(d);
d(f) = f;
ic = cummax(d);
y = (2 : numel(x) + 1).' - ic;

When x is unsorted use this:

[s, is] = sort(x);
d = [1 ;diff(s)];
f = find(d);
d(f) = f;
ic = cummax(d);
y(is) = (2 : numel(s) + 1).' - ic;

Original Answer that only works on GNU Octave:
Assuming that x is sorted:

x = [10 10 10 20 20 30].';
x = sort(x);

[~, ic] = cummax(x);
y = (2 : numel(x) + 1).' - ic;

When x is unsorted use this:

[s, is] = sort(x);
[~, ic] = cummax(s);
y(is) = (2 : numel(s) + 1).' - ic;
rahnema1
  • 15,264
  • 3
  • 15
  • 27
1

You could loop over the unique elements, and set their indices to 1:n each time...

u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
    idx = (x == u(ii));
    y(idx) = 1:nnz(idx);
end
Wolfie
  • 27,562
  • 7
  • 28
  • 55
1

This is a little inefficient because it generates an intermediate matrix, when actually only a triangular half is needed:

y = sum(triu(x==x.')).';
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • This is concise but as you said inefficient. Turns out it's not just a little :) My real array is much longer. See my edits. – data-monkey Sep 23 '20 at 18:13
  • I see. It would have helped if you had included that information from the beginning :-) – Luis Mendo Sep 23 '20 at 18:47
  • Anyway, in my comparisons the matrix method fares between the other two you used in your timing, as long as the required matrix fits in the available memory. I'm using R2017b – Luis Mendo Sep 23 '20 at 20:21
1

Here's a no-for-loop version. On my machine it's a bit faster than the previous working methods:

% if already sorted, can omit this first and last line
[s, is] = sort(x);
[u,~,iu] = unique(s);
c = accumarray(iu,1);
cs = cumsum([0;c]);
z = (1:numel(x))'-repelem(cs(1:end-1),c);
y(is) = z;
Alec Jacobson
  • 6,032
  • 5
  • 51
  • 88