142

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

Maciej Ziarko
  • 11,494
  • 13
  • 48
  • 69
user225312
  • 126,773
  • 69
  • 172
  • 181

18 Answers18

222

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

yan
  • 20,644
  • 3
  • 38
  • 48
  • This is what I do as well. Probably not efficient for large lists although. – tkerwin Mar 11 '11 at 20:49
  • Not necessarily, that will execute the body of the conditional if the list has repeating elements (the "#do something" in the example). – yan Mar 11 '11 at 20:49
  • 2
    Fair enough, good solution. I am handling barely < 500 elements, so this should do what I want. – user225312 Mar 11 '11 at 20:54
  • 4
    For those worried about efficiency with long lists, this *is* efficient for long lists that are actually unique (where all elements need checking). Early exit solutions take longer (roughly 2x longer in my tests) for actually unique lists. So... if you expect most of your lists to be unique, use this simple set length checking solution. If you expect most of your lists to NOT be unique, use an early exit solution. Which one to use depends on your use case. – Russ Oct 17 '17 at 14:08
  • 4
    This answer is nice. However, let's be careful here: `len(x) > len(set(x))` is True when the elements in `x` are NOT unique. This question's title asks exactly the opposite: "Checking if all elements in a list *are* unique" – WhyWhat Apr 25 '20 at 20:54
119

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • 10
    +1 clean and doesn't iterate through the whole list if not needed. – Kos Nov 29 '12 at 15:49
  • 1
    @paul-mcguire: Would you be willing to license this code snippet under an Apache 2.0-compatible license (e.g., Apache 2, 2/3-line BSD, MIT, X11, zlib). I'd like to use it in an Apache 2.0 project I'm using, and because StackOverflow's licensing terms are _fubar_, I'm asking you as the original author. – Ryan Parman Sep 25 '16 at 01:22
  • I've put out other code using MIT license, so that works for me for this snippet. Anything special I need to do? – PaulMcG Sep 25 '16 at 04:32
21

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

6502
  • 112,025
  • 15
  • 165
  • 265
  • I accepted the other answer as I was not particularly looking for optimization. – user225312 Mar 11 '11 at 21:00
  • 2
    You can shorten this by putting the following line after `s = set()`... `return not any(s.add(x) if x not in s else True for x in g)` – Andrew Clark Mar 11 '11 at 21:42
  • Could you explain why you would expect `len(x) != len(set(x))` to be faster than this if early-exiting is not common? Aren't both operations **O(len(x))**? (where `x` is the original list) – Chris Redford May 12 '12 at 14:55
  • Oh, I see: your method is not **O(len(x))** because you check `if x in s` inside of the **O(len(x))** for loop. – Chris Redford May 12 '12 at 14:58
18

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)
jassinm
  • 7,323
  • 3
  • 33
  • 42
15

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)
Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
8

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)
Tugrul Ates
  • 9,451
  • 2
  • 33
  • 59
3

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • hashing is faster than sorting – IceArdor Oct 23 '14 at 03:26
  • Came here to post the same solution using `groupby` and found this answer. I find this most elegant, since this a single expression and works with the built-in tools without requiring any extra variable or loop-statement. – Lars Blumberg Nov 21 '19 at 21:10
  • 1
    If your list contains arbitrary objects which are not sortable, you can use the `id()` function to sort them as this is a prerequisite for `groupby()` to work: `groupby(sorted(seq), key=id)` – Lars Blumberg Nov 21 '19 at 21:19
3

Here is a recursive O(N2) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True
Karol
  • 1,246
  • 2
  • 13
  • 20
3

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

enter image description here

enter image description here


Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()
Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
2

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.

mhourdakis
  • 179
  • 8
  • 1
    `H in T` does a linear search, and `T = L[1:]` copies the sliced part of the list, so this will be much slower than the other solutions that have been suggested on big lists. It is O(N^2) I think, while most of the others are O(N) (sets) or O(N log N) (sorting based solutions). – Blckknght Apr 28 '13 at 17:36
2

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique, False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False
buhtz
  • 10,774
  • 18
  • 76
  • 149
ArunPratap
  • 4,816
  • 7
  • 25
  • 43
1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1
yilmazhuseyin
  • 6,442
  • 4
  • 34
  • 38
1

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique
Tom
  • 91
  • 7
0

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

user1718097
  • 4,090
  • 11
  • 48
  • 63
0

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

canisrufus
  • 665
  • 2
  • 6
  • 19
  • this f5 function will be slower than using set which is better optimized for speed. This code starts to break when the list gets really large due to the expensive "append" operation. with large lists like `x = range(1000000) + range(1000000)`, running set(x) is faster than f5(x). Order is not a requirement in the question but even running sorted(set(x)) is still faster than f5(x) – Okezie Aug 16 '14 at 21:50
0

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

horseshoe
  • 1,437
  • 14
  • 42
0

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

Chris
  • 5,664
  • 6
  • 44
  • 55
-3

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True
Nikolay Fominyh
  • 8,946
  • 8
  • 66
  • 102
  • I like this answer, just because it shows quite well what code you don't have to write when using a set. I wouldn't label it "for beginners", as I believe beginners should learn to do it the right way up front; but I met some unexperienced developers who were used to writing such code in other languages. – cessor Jan 09 '20 at 16:26