1

I have an n-by-1 vector where n = 20000. I would like to do a decile ranking for the data in this vector, which is basically replacing the value of each element by its corresponding decile.

I am currently doing it this way:

deciles = quantile(X,9);
X = discretize(X,[-inf deciles inf]);

Where X is my array of data. I'm doing this because I want to have 10 groups of data with the same number in each of them.

Can you validate this procedure or let me know if there is a more robust way to do so?

Dan
  • 45,079
  • 17
  • 88
  • 157
Tulkkas
  • 973
  • 3
  • 10
  • 22
  • You can easily validate this yourself - just construct a small sample `X` and see if you get the correct results. One thing I would suggest immediately is no to do this in-place. i.e. don't override `X` but rather make a new variable, say `X_dec`, so that you can compare it against `X` which will assist you in self validating your procedure – Dan May 11 '16 at 08:54
  • Yes I did that already and it works fine for a small sample. But sometimes when used with a huge data set, there are things which can go wrong! Just wanted to hear from other if this would make sense or if there was maybe another way to do it! But thanks for the comment! – Tulkkas May 11 '16 at 08:59

1 Answers1

1

You can easily verify that what you have is correct by creating simple data of a known size.

nGroups = 10;
nPerGroup = 10000;

X = linspace(0, 1, nGroups * nPerGroup);

deciles = quantile(X, nGroups - 1);
X = discretize(X,[-inf deciles inf]);

nPerGroup = arrayfun(@(x)sum(X == x), 1:nGroups)
%// 10000   10000   10000   10000   10000   10000   10000   10000   10000   10000

Another alternative is to instead sort your data and then reshape so that the number of columns is the number of desired groups. This approach would rely on only built-in functions

X = linspace(0, 1, nGroups * nPerGroup);
Y = reshape(sort(X), [], nGroups);

Each column is then a different group.

Suever
  • 64,497
  • 14
  • 82
  • 101