2

I am using hist to compute the number of occurrences of values in a matrix in Matlab.

I think I am using it wrong because it gives me completely weird results. Could you help me to understand what is going on?

When I run this piece of code I get countsB as desired

rng default; 
B=randi([0,3],10,1);
idxB=unique(B);
countsB=(hist(B,idxB))';

i.e.

B=[3;3;0;3;2;0;1;2;3;3];
idxB=[0;1;2;3];
countsB=[2;1;2;5];

When I run this other piece of code I get wrong results for countsA

A=ones(524288,1)*3418;
idxA=unique(A);
countsA=(hist(A,idxA))';

i.e.

idxA=3148;
countsA=[zeros(1709,1); 524288; zeros(1708,1)];

What am I doing wrong?

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
TEX
  • 2,249
  • 20
  • 43

3 Answers3

3

idxA is a scalar, which means the number of bins in this context. setting idxA as a vector instead e.g. [0,3418] will get you a hist with bins centered at 0 and 3418, similarly to what you got with idxB, which was also a vector

Yuval Harpaz
  • 1,416
  • 1
  • 12
  • 16
3

To add to the other answers: you can replace hist by the explicit sum:

idxA = unique(A);
countsA = sum(bsxfun(@eq, A(:), idxA(:).'), 1);
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Is it as fast as or faster than `hist` for `size(A,1)` very large? – TEX Mar 23 '17 at 12:39
  • I always [bet on `bsxfun`](https://meta.stackoverflow.com/a/303542/2586922), but better test yourself for your specific case – Luis Mendo Mar 23 '17 at 12:39
  • 2
    When `idxA` and `A` have huge length (e.g. 8000 and 1316966) it gives me error because `Error using bsxfun Requested 1316966x8000 (9.8GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information` However, when size limits are not exceeded, it is faster than `hist`. – TEX Mar 23 '17 at 13:01
2

I think it has to do with:

N = HIST(Y,M), where M is a scalar, uses M bins.

and I think you are assuming it would do:

N = HIST(Y,X), where X is a vector, returns the distribution of Y
among bins with centers specified by X.

In other words, in the first case matlab is assuming that you are asking for 3418 bins

HelloWorld
  • 697
  • 10
  • 16