1

I'm trying to identify the specific word (from a list) that was identified in a string sentence.

I've managed to import a list of (inappropriate) words which is then compared to an input sentence to see if that word is in the sentence (used in a basic if loop) - It works well (code below), but now I need to identify which word was actually found to use as part of the output.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from inappropriate_words import inappropriate # a list of inappropriate words
import sys

message = ' '.join(sys.argv[1:]) # the input message already converted to lowercase
message = message.replace(".", "") # to remove the full stop as well
#print (message) #to test if needed

if any(word in message.split() for word in inappropriate):
    print "SAMPLE WORD is inappropriate."

An example would be:
Input: "Do you like cookies"
Process: Cookies is on the inappropriate list so it is identified and the if loop triggers
Output: "Cookies is inappropriate." # I love cookies SBTW

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • Here's a [reference question](http://stackoverflow.com/questions/8845245/high-performance-mass-short-string-search-in-python) when performance becomes an issue. Incidentally, the question there already answers yours. – quazgar Aug 09 '15 at 13:16

1 Answers1

1

I would use a set to store inappropriate words then simply do a lookup which is O(1) as opposed to O(n) using a list:

st = set(inappropriate)
message = ' '.join(sys.argv[1:]) # the input message already converted to lowercase
message = message.replace(".", "") # to remove the full stop as well

for word in message.split():
    if word in st:
        print "{} is inappropriate.".format(word)

If you want to see if any word matches then add a break, to see all the matching words use as is.

You can also use set.intersection to find all the common words:

comm = st.intersection(message.split()) 

Lastly instead of joining and replacing you can strip punctuation off the words and use argv[1:] :

from string import punctuation

from inappropriate_words import inappropriate # a list of     inappropriate words
import sys

for word in sys.argv[1:]:
    if word.strip(punctuation) in st:
        print "{} is inappropriate.".format(word)
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • 1
    You probably mean `st = set(...)` in line 1? And this may be `O(1)` with respect to the number of inappropriate words, but `O(k)` for messages of `k` words. It might be even faster to split the message into a set instead. – quazgar Aug 09 '15 at 13:08
  • @quazgar, yep, a typo. i meant the set lookups are O(1) as stated in the first line, not sure which would be more efficient , it all depends on the size. Either way this will be a more efficient than the OP's approach – Padraic Cunningham Aug 09 '15 at 13:18
  • Thank you. I didn't use the set though. Still used my imported list. It's not a huge list, but will keep this in mind if it becomes one. – Renier Delport Aug 09 '15 at 13:36