Testing whether a string has repeated characters

Question

I am trying to figure out the lightest method for determining whether a string has any repeated characters, in the lightest way possible. I have tried searching for similar questions, but cant find any. It also needs to be the shorted way possible, as i will be checking quite a few strings (I can handle putting this into a loop, etc.)

For example:

a = "12348546478"
#code to check multiple characters
print(result)

Results: 8 was repeated, 4 was repeated

The code will check what character was repeated and print out what was repeated. I don't need to know how many times it was repeated, just whether it was or was not repeated.

I have tried it myself, but i am just not very good at python, i am quite new. The code has too many steps and is making the loops huge. I was looking for a shorter pythonic way. — Armageddon80, Aug 19 '15 at 08:21

score 30 · Answer 1 · answered Aug 19 '15 at 08:27

30

Or alternatively you could do

len(set(x)) == len(x)

This returns a boolean, True if the string has no repeating characters, False otherwise.

The set type can't have any duplicates so when the string gets turned into one, it gets broken down into characters. The difference in length shows how many repeated characters there were (But NOT the characters themselves)

answered Aug 19 '15 at 08:27

muddyfish

3,530
30
37

This is a good answer, but in my situation it is not what i need. – Armageddon80 Aug 19 '15 at 08:32
To anyone else who might want this, my answer will be more quick then @Kasramvd's answer but theirs will be more flexible. – muddyfish Aug 19 '15 at 08:34
This is not an answer to this question at all. – Mazdak Aug 19 '15 at 08:55

Mazdak · Accepted Answer · 2015-08-19T08:51:05.570

You can use collections.Counter :

>>> from collections import Counter
>>> [i for i,j in Counter(a).items() if j>1]
['4', '8']

Or you can use a custom function :

>>> def finder(s):
...    seen,yields=set(),set()
...    for i in s:
...      if i in seen:
...         if i not in yields:
...            yield i
...            yields.add(i)
...         else :
...            yields.add(i)
...      else:
...          seen.add(i)
... 
>>> list(finder(a))
['4', '8']

Or use str.count method in a set comprehension :

>>> set(i for i in a if a.count(i)>1)
set(['8', '4'])

A benchmark on all approaches, which shows that the last 2 way (custom function and set comprehensions are much faster than Counter):

from timeit import timeit


s1="""
a = "12348546478"
[i for i,j in Counter(a).items() if j>1]

"""
s2="""
def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i in seen:
         if i not in yields:
            yield i
            yields.add(i)
         else :
            yields.add(i)
      else:
          seen.add(i)

a = "12348546478"
list(finder(a))

"""

s3="""
a = "12348546478"
set(i for i in a if a.count(i)>1)
"""

print '1st: ' ,timeit(stmt=s1, number=100000,setup="from collections import Counter")
print '2nd : ',timeit(stmt=s2, number=100000)
print '3rd : ',timeit(stmt=s2, number=100000)

result :

1st:  0.726881027222
2nd :  0.265578985214
3rd :  0.26243185997

I also tried this for long string (a = "12348546478"*10000) and still got the same result:

1st:  25.5780302721341
2nd :  11.8482989001177
3rd :  11.926538944245

Any way my suggestion is using the set comprehension which is more pythonic :

set(i for i in a if a.count(i)>1)

Intead of `set(i for i in a if a.count(i)>1)`, I would use a set comprehension and use `{i for i in a if a.count(i)>1}` instead — muddyfish, Aug 19 '15 at 08:44
There is updates on the Performance measurement - please see below (last Post). — Daniel Hao, Feb 26 '21 at 03:48

score 2 · Answer 3 · answered Aug 19 '15 at 08:38

you can also use dictionary to get the count of unique characters as the key in a dictionary is always unique.

import collections

d = collections.defaultdict(int)
for c in a:
    d[c] += 1

d will contain {'1': 1, '3': 1, '2': 1, '5': 1, '4': 3, '7': 1, '6': 1, '8': 2}

And the answer given by Kasramvd is a nice approach.

ASB · Answer 4 · 2018-10-01T09:05:25.780

You can use the function below to check a character repetition. It returns True if there is no repetition of character and returns False otherwise.

Python Code

def isThereRepitition(x):
   for char in x: #copies and iterates passing a value to char everytime
       x=x[1:] #deletes the first character in the string x
       if char in x: #checks if there is char in x string
           return False
return True

Subham · Answer 5 · 2021-02-26T06:43:09.310

Simplifying @Kasravnd Second Answer,

First approach :

def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i not in seen:
         seen.add(i)
         
      elif i not in yields:
         yield i
         yields.add(i)
         
a = "12348546478"
print(list(finder(a)))

Second approach

def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i in seen and i not in yields:
          yield i
          yields.add(i)
      else:
          seen.add(i)
    
a = "12348546478"
print(list(finder(a)))

Third approach

def finder(s):
    yield from {i for i, v in enumerate(s) if v in s[i+1:]}

a = "12348546478"
print(list(set(a[i] for i in finder(a))))

all produce what are duplicates

['4', '8']

[Program finished]

@muddyfish is the easiest way to check if there exits a duplicate.

score 0 · Answer 6 · answered Sep 30 '18 at 10:26

0

import collections


 a = "12348546478"
 countOfWords = collections.Counter(a)
 result = [i for i in countOfWords if countOfWords[i]>1]
 result

Try this

answered Sep 30 '18 at 10:26

ambati charishma

1

score 0 · Answer 7 · answered Jan 27 '21 at 03:14

0

Updates from the `future`. (Jan 26, 2021)

Just out of curiosity, re-run this test with 2 changes in Python3.8 tonight and got very different resultls:

change 1- from collections import Counter # import this first
change 2 - make a larger number string: a = "123485464781233299345355234234355234458"

results:
1st:  0.4764095
2nd :  0.6692353
3rd :  0.6512726000000002

answered Jan 27 '21 at 03:14

Daniel Hao

4,922
3
10
23

1

So it seems Counter is the faster one. Loading lib first did make difference. Thx for the updates. – Jan 27 '21 at 17:57

score 0 · Answer 8 · answered Nov 29 '21 at 11:26

0

Or with numpy's unique:

import numpy as np
chars, times = np.unique(list("12348546478"), return_counts = True)
chars[times > 1]

answered Nov 29 '21 at 11:26

schotti

155
5

Testing whether a string has repeated characters

8 Answers8

Updates from the `future`. (Jan 26, 2021)

Linked

Testing whether a string has repeated characters

8 Answers8

Updates from the future. (Jan 26, 2021)

Linked

Updates from the `future`. (Jan 26, 2021)