How to efficiently find and merge duplicate enteries of a vector in MATLAB?

Question

I am writing a MATLAB code to quickly solve the following problem: Let X be a random variable distributed according to P(x), take two independent copies of X, call them X1 and X2 and find the distribution of Y = f(X1,X2) where f(,) is a known function.

To solve the above, I start with two vectors x and p such that p(i) = P(x(i)). Suppose they both contain n elements. I can easily compute the n-by-n matrix y such that y(i,j) = f(x(i), x(j)). Furthermore, I can compute the n-by-n matrix p_out such that p_out(i,j) = p(i) * p(j). This means P(Y = y(i,j)) = p(i,j).

Now, if all elements of y are distinct we are almost done. It remains just converting the matrices to vectors and perhaps sorting them to have a nice output. Suppose we also do this by setting

y = y(:);
p_out = p_out(:);
[y, idx] = sort(y);
p_out = p_out(idx);

The problem is, however, the elements of y are not typically unique. I, hence, have to merge the identical elements of y as follows: if y(i) = y(j) (remember now y is converted to a vector) then remove y(j) and set p(i) = p(i) + p(j). A dirty way of doing this is using a for loop (since y is now sorted we only need to compare each element with its following element). However, I wonder if there exits a nicer way.

I know that unique would remove the duplicated elements of a vector (hence if we only needed y it would be sufficient). I also know that it returns two index vectors that somehow indicate the position of duplicated elements. However, I cannot think of any nice way to use its outputs to appropriately merge the elements of p as well.

Maybe this can help: http://stackoverflow.com/questions/18639518/generate-and-plot-the-empirical-joint-pdf-and-cdf-in-matlab — Luis Mendo, Dec 16 '13 at 16:32

Luis Mendo · Accepted Answer · 2013-12-16T16:47:31.057

3

If I understand correctly, this is a job for accummarray:

y = [1 3 2 4 2 5 6 5 5 1]; %// example data
p = [.1 .5 .3 .2 .4 .1 .1 .2 .1 .3]; %// example data

[y_unique ii jj] = unique(y);
p_summed = accumarray(jj.',p).';

Result:

>> y_unique

y_unique =

     1     2     3     4     5     6

p_summed =

    0.4000    0.7000    0.5000    0.2000    0.4000    0.1000

edited Dec 16 '13 at 16:47

answered Dec 16 '13 at 16:40

Luis Mendo

110,752
13
76
147

Thanks a lot. I was exactly looking for such a function! – MikeL Dec 16 '13 at 16:44
This `unique`/`accumarray` usage is perhaps my favorite MATLAB idiom. +1 – chappjc Dec 16 '13 at 18:50
@chappjc It would be nice if it could be combined into a single line, for example if the desired output of `unique` was the first, not the third – Luis Mendo Dec 16 '13 at 18:53
Nah, I think two lines is just fine! – chappjc Dec 16 '13 at 18:55

How to efficiently find and merge duplicate enteries of a vector in MATLAB?

1 Answers1