In the q for mortals chapter on data normalisation, i.e. the task of eliminating duplication in a list, it recommends using enumerations for finding distinct values in a list as its faster to traverse over integers than it is over symbols of variable length
u:`g`ibm`intl`msft / unique list of tickers
v:1000000?u / list with duplicate tickers
k:u?v / positions in u
\t:10 distinct v / performing distinct on symbols 10 times and timing
\t:10 distinct k / performing distinct on positions 10 times and timing
I find that distinct v
is much faster than distinct k
which is not in line with what was promised.
Thanks for the help.