Python: Return the words in a string that occur exactly once

Question

Let's say I have a function that takes in some string, and then I need to return the set of words in this string that occur exactly once. What is the best way to go about doing this? Would using dict be helpful? I've tried some pseudocode like:

counter = {}
def FindWords(string):
    for word in string.split()
        if (word is unique): counter.append(word)
return counter

Is there a better way to implement this? Thanks!

edit:

Say I have: "The boy jumped over the other boy". I want to return "jumped," "over," and "other."

Also, I'd like to return this as a set, and not a list.

Say I have a set of words like: "The boy jumped over the other boy". I want to return "jumped," "over," and "other." — J. P., Oct 03 '17 at 22:24

James · Accepted Answer · 2017-10-04T00:51:10.697

3

You can use the Counter from collections and return a set of the words that occur only once.

from collections import Counter

sent = 'this is my sentence string this is also my test string'

def find_single_words(s):
    c = Counter(s.split(' '))
    return set(k for k,v in c.items() if v==1)

find_single_words(sent)
# returns:
{'also', 'sentence', 'test'}

To do this with just the base Python utilities, you can use a dictionary to keep count of the occurrences, replicating the functionality of Counter.

sent = 'this is my sentence string this is also my test string'

def find_single_words(s):
    c = {}
    for word in s.split(' '):
        if not word in c:
             c[word] = 1
        else:
             c[word] = c[word] + 1
    return [k for k,v in c.items() if v==1]

find_single_words(sent)
# returns:
['sentence', 'also', 'test']

edited Oct 04 '17 at 00:51

answered Oct 03 '17 at 22:22

James

32,991
4
47
70

Is there a way to do this without exporting outside tools like Counter? – J. P. Oct 03 '17 at 22:24
1

@J.P. `collections` is part of the standard library, it is not really an outside tool – James Oct 03 '17 at 22:24
@J.P. i added an additional part to my answer, see above – James Oct 03 '17 at 22:31
Hi, thanks! Do you know how you would change this if you wanted to return a set instead of a list? Instead of c.items(), could you return a set instead? – J. P. Oct 03 '17 at 22:39
@J.P. sure, i modified the second part of my answer to return a set – James Oct 04 '17 at 00:50
Great, thanks a lot! when you use: return[k for k,v in c.items() if v==1], is v being newly defined here as an index of c? – J. P. Oct 04 '17 at 03:20
@James, if you test it with the OP's input ("The boy jumped over the other boy"), your code returns `{'The', 'jumped', 'other', 'over', 'the'}`, which is not what the OP wanted. The words should be converted to lowercase, then look for their frequency. – srikavineehari Oct 15 '17 at 07:49

Bill Bell · Answer 2 · 2017-10-03T22:40:46.840

0

This might be what you have in mind.

>>> counts = {}
>>> sentence =  "The boy jumped over the other boy"
>>> for word in sentence.lower().split():
...     if word in counts:
...         counts[word]+=1
...     else:
...         counts[word]=1
...         
>>> [word for word in counts if counts[word]==1]
['other', 'jumped', 'over']
>>> set([word for word in counts if counts[word]==1])
{'other', 'jumped', 'over'}

But using defaultdict from Collections, as someone else suggested, is nicer.

edited Oct 03 '17 at 22:40

answered Oct 03 '17 at 22:29

Bill Bell

21,021
5
43
58

Uniques should not be giving "the" or "boy." It should only give "jumped," "over", and "other." – J. P. Oct 03 '17 at 22:31
Thank you! Do you know how to return this as a set rather than a list? – J. P. Oct 03 '17 at 22:40
Added that in. set() changes a list to a set. – Bill Bell Oct 03 '17 at 22:41

TubbyStubby · Answer 3 · 2017-10-03T22:53:31.487

0

s='The boy jumped over the other boy'
def func(s):
    l=[]
    s=s.split(' ')  #edit for case-sensitivity here
    for i in range(len(s)):
        if s[i] not in s[i+1:] and s[i] not in s[i-1::-1]:
            l.append(s[i])
    return set(l)  #convert to set and return
print(func(s))

this should work pretty fine.

check for each element whether any element matches it it in the list ahead or behind it if not then append it.

if you do not want case sensitivity then you can add s=s.lower() or s=s.upper() before splitting it.

edited Oct 03 '17 at 22:53

answered Oct 03 '17 at 22:35

TubbyStubby

137
3
13

Going through the entire word list for every word makes this an O(n^2) algorithm, which can get pretty slow as the input gets bigger. Using a dictionary to count the number of occurrences would scale to large inputs a lot better. – Bass Oct 03 '17 at 23:14

score 0 · Answer 4 · answered Oct 03 '17 at 22:44

0

You can try this:

s = "The boy jumped over the other boy"
s1 = {"jumped", "over", "other"}
final_counts = [s.count(i) for i in s1]

Output:

[1, 1, 1]

answered Oct 03 '17 at 22:44

Ajax1234

69,937
8
61
102

srikavineehari · Answer 5 · 2017-10-15T08:05:12.083

Try this.

>>> sentence = "The boy jumped over the other boy"
>>> set(word for word in sentence.lower().split() if sentence.count(word) == 1)
{'other', 'over', 'jumped'}
>>>

Edit: This is easier to read:

>>> sentence = 'The boy jumped over the other boy'
>>> words = sentence.lower().split()
>>> uniques = {word for word in words if words.count(word) == 1}
>>> uniques
{'over', 'other', 'jumped'}
>>> type(uniques)
<class 'set'>

Python: Return the words in a string that occur exactly once

5 Answers5