Checking if all elements in a list are unique

Question

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

score 222 · Accepted Answer · answered Mar 11 '11 at 20:47

222

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

answered Mar 11 '11 at 20:47

yan

20,644
3
38
48

This is what I do as well. Probably not efficient for large lists although. – tkerwin Mar 11 '11 at 20:49
Not necessarily, that will execute the body of the conditional if the list has repeating elements (the "#do something" in the example). – yan Mar 11 '11 at 20:49
2

Fair enough, good solution. I am handling barely < 500 elements, so this should do what I want. – user225312 Mar 11 '11 at 20:54
4

For those worried about efficiency with long lists, this *is* efficient for long lists that are actually unique (where all elements need checking). Early exit solutions take longer (roughly 2x longer in my tests) for actually unique lists. So... if you expect most of your lists to be unique, use this simple set length checking solution. If you expect most of your lists to NOT be unique, use an early exit solution. Which one to use depends on your use case. – Russ Oct 17 '17 at 14:08
4

This answer is nice. However, let's be careful here: `len(x) > len(set(x))` is True when the elements in `x` are NOT unique. This question's title asks exactly the opposite: "Checking if all elements in a list *are* unique" – WhyWhat Apr 25 '20 at 20:54

PaulMcG · Answer 2 · 2012-12-26T21:16:07.697

119

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

edited Dec 26 '12 at 21:16

answered Mar 12 '11 at 09:12

PaulMcG

62,419
16
94
130

10

+1 clean and doesn't iterate through the whole list if not needed. – Kos Nov 29 '12 at 15:49
1

@paul-mcguire: Would you be willing to license this code snippet under an Apache 2.0-compatible license (e.g., Apache 2, 2/3-line BSD, MIT, X11, zlib). I'd like to use it in an Apache 2.0 project I'm using, and because StackOverflow's licensing terms are _fubar_, I'm asking you as the original author. – Ryan Parman Sep 25 '16 at 01:22
I've put out other code using MIT license, so that works for me for this snippet. Anything special I need to do? – PaulMcG Sep 25 '16 at 04:32

6502 · Answer 3 · 2011-03-11T21:08:22.497

21

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

edited Mar 11 '11 at 21:08

answered Mar 11 '11 at 20:50

6502

112,025
15
165
265

I accepted the other answer as I was not particularly looking for optimization. – user225312 Mar 11 '11 at 21:00
2

You can shorten this by putting the following line after `s = set()`... `return not any(s.add(x) if x not in s else True for x in g)` – Andrew Clark Mar 11 '11 at 21:42
Could you explain why you would expect `len(x) != len(set(x))` to be faster than this if early-exiting is not common? Aren't both operations **O(len(x))**? (where `x` is the original list) – Chris Redford May 12 '12 at 14:55
Oh, I see: your method is not **O(len(x))** because you check `if x in s` inside of the **O(len(x))** for loop. – Chris Redford May 12 '12 at 14:58

jassinm · Answer 4 · 2012-11-29T22:50:58.323

18

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

edited Nov 29 '12 at 22:50

answered Nov 29 '12 at 20:29

jassinm

7,323
3
33
42

score 15 · Answer 5 · answered Mar 11 '11 at 20:48

15

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)

answered Mar 11 '11 at 20:48

Grzegorz Oledzki

23,614
16
68
106

1

Answered one second after yan, ouch. Short and sweet. Any reasons why not to use this solution? – jasonleonhard Sep 19 '17 at 21:25
Not all sequences (generators especially) support `len()`. – PaulMcG Apr 18 '18 at 12:42

score 8 · Answer 6 · answered Mar 11 '11 at 20:50

8

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)

answered Mar 11 '11 at 20:50

Tugrul Ates

9,451
2
33
59

14

I see absolutely no advantage to using a dict over a set. Seems to unnecessarily complicate things. – metasoarous Feb 21 '17 at 08:14

score 3 · Answer 7 · answered Dec 27 '12 at 04:34

3

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

answered Dec 27 '12 at 04:34

PaulMcG

62,419
16
94
130

hashing is faster than sorting – IceArdor Oct 23 '14 at 03:26
Came here to post the same solution using `groupby` and found this answer. I find this most elegant, since this a single expression and works with the built-in tools without requiring any extra variable or loop-statement. – Lars Blumberg Nov 21 '19 at 21:10
1

If your list contains arbitrary objects which are not sortable, you can use the `id()` function to sort them as this is a prerequisite for `groupby()` to work: `groupby(sorted(seq), key=id)` – Lars Blumberg Nov 21 '19 at 21:19

score 3 · Answer 8 · answered Dec 14 '14 at 05:51

3

Here is a recursive O(N²) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

answered Dec 14 '14 at 05:51

Karol

1,246
2
13
20

Nico Schlömer · Answer 9 · 2023-01-03T16:46:03.537

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

score 2 · Answer 10 · answered Apr 28 '13 at 16:12

2

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.

answered Apr 28 '13 at 16:12

mhourdakis

179
8

1

`H in T` does a linear search, and `T = L[1:]` copies the sliced part of the list, so this will be much slower than the other solutions that have been suggested on big lists. It is O(N^2) I think, while most of the others are O(N) (sets) or O(N log N) (sorting based solutions). – Blckknght Apr 28 '13 at 17:36

score 2 · Answer 11 · edited May 30 '22 at 18:42

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique, False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

score 1 · Answer 12 · answered Nov 08 '12 at 09:03

1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

answered Nov 08 '12 at 09:03

yilmazhuseyin

6,442
4
34
38

score 1 · Answer 13 · answered Mar 18 '22 at 16:59

1

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique

answered Mar 18 '22 at 16:59

Tom

91
7

score 0 · Answer 14 · answered Apr 19 '16 at 22:38

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

score 0 · Answer 15 · answered Mar 11 '11 at 20:51

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

this f5 function will be slower than using set which is better optimized for speed. This code starts to break when the list gets really large due to the expensive "append" operation. with large lists like `x = range(1000000) + range(1000000)`, running set(x) is faster than f5(x). Order is not a requirement in the question but even running sorted(set(x)) is still faster than f5(x) — Okezie, Aug 16 '14 at 21:50

score 0 · Answer 16 · answered Nov 29 '21 at 14:15

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

score 0 · Answer 17 · answered Feb 25 '22 at 15:57

0

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

answered Feb 25 '22 at 15:57

Chris

5,664
6
44
55

score -3 · Answer 18 · edited Mar 17 '16 at 11:20

-3

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

edited Mar 17 '16 at 11:20

Nikolay Fominyh

8,946
8
66
102

answered Nov 04 '15 at 14:37

DonChriss

25

I like this answer, just because it shows quite well what code you don't have to write when using a set. I wouldn't label it "for beginners", as I believe beginners should learn to do it the right way up front; but I met some unexperienced developers who were used to writing such code in other languages. – cessor Jan 09 '20 at 16:26

Checking if all elements in a list are unique

18 Answers18

Linked

Related