11

I would like to get the number of different values found in a List.

For example:

The output for the List a={1,2,3,4,5} would be 5 whereas it would be 2 for b={1,1,1,2,2}.

Smi
  • 13,850
  • 9
  • 56
  • 64
500
  • 6,509
  • 8
  • 46
  • 80

4 Answers4

14

Just for amusement, all the following commands also give the desired result:

Length@Gather@l

Length@Union@l

Length@Tally@l

Count[BinCounts@l, Except@0]

Count[BinLists@l, Except@{}]

Length@Split@Sort@l

Length@GatherBy[l, # &]

Length@Split@SortBy[l, # &]

And many more, of course.

Edit

Here is a little timing experiment (not serious)

l = RandomInteger[{1, 10^2}, 10^7];
t2[x_] := {Timing[x], ToString[HoldForm@x]};
SetAttributes[t2, HoldAll]
Grid[Reverse /@
  {t2[Length@DeleteDuplicates[l]],
   t2[Length@Tally[l]],
   t2[Length@Gather[l]],
   t2[Count[BinCounts[l], Except@0]],
   t2[Length@Union[l]],
   t2[Length@Split@Sort@l],
   t2[Count[BinLists[l], Except@0]]},
 Frame -> All]

enter image description here

BTW: Note the difference between BinLists[ ] and BinCounts[ ]

Edit

A more detailed view of DeleteDuplicates vs Tally

t = Timing;
ListLinePlot@Transpose@
  Table[l = RandomInteger[{1, 10^i}, 10^7];
   {Log@First@t@Length@DeleteDuplicates@l,
    Log@First@t@Length@Tally@l},
   {i, Range[7]}]

Beware! Log Plot!

enter image description here

Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
  • Thank you Belisarius ! What would be your favorite ? Fastest ? – 500 May 27 '11 at 20:29
  • @500 As it was mentioned before,`DeleteDuplicates[ ]` is the fastest AFAIK. These are just for showing to the OP some other ways to do the same. – Dr. belisarius May 27 '11 at 20:32
  • @Belisarius, as Sjoerd only compared it to union I was not sure about your other solution, thank you ! – 500 May 27 '11 at 20:44
  • 1
    @Belisarius : This is impressive. I also discover that very elegant HoldForm, thank you. – 500 May 28 '11 at 01:56
13

Use DeleteDuplicates (or Union in older versions) to remove duplicate elements. You can then count the elements in the returned list.

In[8]:= Length[DeleteDuplicates[a]]
Out[8]= 5

In[9]:= Length[DeleteDuplicates[b]]
Out[9]= 2
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
9
Length[DeleteDuplicates[a]]

would do the trick. Depending on what else you're going to do, you could use Union or Tally instead of DeleteDuplicates.

Brett Champion
  • 8,497
  • 1
  • 27
  • 44
  • 5
    It may be good to note that DeleteDuplicates can be 20 times as fast as Union. Union returns a sorted list whereas DeleteDuplicates keeps the resulting values in their original order. – Sjoerd C. de Vries May 27 '11 at 16:10
  • @Sjoerd, great point. It can be far more, if the list is mostly duplicates. Try: `RandomInteger[999, 150000]`. – Mr.Wizard May 27 '11 at 18:27
  • 2
    @Sjoerd, @Mr.Wizard What you observed is entirely due to the packed nature of the data on which this is so. If we take @Mr's example `rnd = RandomInteger[999, 150000];` and do `rnd[[100000]] = 1/2;`, and then do the benchmarks, `DeleteDuplicates` is still faster but only by a factor of `3` or so, which is probably due to its linear complexity as compared to the `n log n` complexity of `Union`. – Leonid Shifrin May 29 '11 at 10:05
1
CountDistinct[a]

would also do the trick, which is a function introduced in Mathematica 10.0, a function equivalent to

Length[DeleteDuplicates[a]]
RatonWasher
  • 250
  • 3
  • 9