2

I'm currently working on an IRC bot for Twitch.tv and I was wondering how I can implement a banned words list? Here is what I have so far and I'm stumped because of my limited knowledge of python. Everything is working great so far except checking to see if banned words are in the message. This is the bit of code in question:

if bannedWords.split in message:
                sendMessage(s, "/ban " + user)
                break

I was thiking of checking a list to see if the message containts anything from the list?

bannedWords = ["badword1", "badword1"]

But I'm just not sure..

import string
from Read import getUser, getMessage
from Socket import openSocket, sendMessage
from Initialize import joinRoom

s = openSocket()
joinRoom(s)
readbuffer = ""
bannedWords = ["badword1", "badword1"]
while True:
        readbuffer = readbuffer + s.recv(1024)
        temp = string.split(readbuffer, "\n")
        readbuffer = temp.pop()

        for line in temp:
            print(line)
            if "PING" in line:
                s.send(line.replace("PING", "PONG"))
                break
            user = getUser(line)
            message = getMessage(line)
            print user + " typed :" + message
            if bannedWords.split in message:
                sendMessage(s, "/ban " + user)
                break

Thanks in advance!!

Aaron
  • 217
  • 2
  • 12
  • Is bannedWords a list defined by you of the banned words? – Banach Tarski Apr 01 '16 at 16:48
  • Just words I add in. I'm kind of trying to take it slow at first. In the end maybe I can add a command to write words into a .txt and then read them off? – Aaron Apr 01 '16 at 16:56
  • 1
    You forgot to call split too, you are asking if a reference to `str.split` is in the the message – Padraic Cunningham Apr 01 '16 at 17:04
  • That `temp.pop()` looks like it could be a problem area as well... – Jon Clements Apr 01 '16 at 17:06
  • 2
    Obligatory link: [Scunthorpe Problem](https://en.wikipedia.org/wiki/Scunthorpe_problem). Think carefully about how rigid you want your filtering system to be, because you may end up silencing legitimate conversation. – Kevin Apr 01 '16 at 17:08
  • @Kevin Thanks for the link, interesting! – Aaron Apr 01 '16 at 17:23
  • @Xarotic you've already created a list from `bannedWords` when you `.split()` it originally... (your `""".split()`...) – Jon Clements Apr 01 '16 at 17:34
  • You shouldn't edit the answers in to your question. It makes it hard for future readers to get what's happening. – Ilja Everilä Apr 01 '16 at 17:38
  • @Ilja I changed it back to the original! Sorry! – Aaron Apr 01 '16 at 17:50
  • @Xarotic don't worry about it, just a friendly reminder. A good question is probably read by many others later on, so if their problem is similar to yours, it's better to leave it intact and let the answers explain for themselves. – Ilja Everilä Apr 01 '16 at 17:59

2 Answers2

4

Assuming both message and bannedWords are strings:

if any(map(message.__contains__, bannedWords.split())):
    ...

If on the other hand bannedWords is already a list, as in your code example, skip the splitting (actually list type has no method split):

if any(map(message.__contains__, bannedWords)):
    ...

This will check if any of the banned words exists at any part of the string; "The grass is greener on the other side." will match banned words like "ass".

Note that map behaves differently between the 2 major python versions:

  • In Python 2 map creates a list, which negates the advantages the short-circuiting behaviour of any would provide. Use a generator expression instead: any(word in message for word in bannedWords).
  • In Python 3 map creates an iterator that will lazily apply the function over the given iterable.

P.s.

About the bannedWords.split(), it is common to see lists of words etc generated in python using multi-line string literals like this:

bannedWords = """
banned
words
are
bad
mmkay
""".split()
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • I actually tried this as well, but for some reason as soon as one of the banned words is typed in the chat, the bot crashes. Maybe my implementation is wrong? – Aaron Apr 01 '16 at 17:01
  • 2
    @Xarotic Your question did not quite make it clear that that's what you're after. Add the tracebacks you get and reword your question. – Ilja Everilä Apr 01 '16 at 17:03
  • Also, do you mean that you tried this type of solution before, or you tried this answer's solution now and got a new exception from somewhere else? – Ilja Everilä Apr 01 '16 at 17:07
  • 1
    @Ilja the use of `map` will differ across 2.x and 3.x (not to mention that calling dunder methods unnecessarily just looks painful) - you can make this consistent and as equally as clear by using: `if any(word in message for word in bannedWords)` – Jon Clements Apr 01 '16 at 17:08
  • @JonClements it will not differ in a meaningful way, unless there are thousands and thousands of banned words. The other will test over a list, the other over an iterable. End result is the same. – Ilja Everilä Apr 01 '16 at 17:08
  • 1
    @Ilja in 2.x `map` will materialise a list first which kind of breaks the advantage of having `any`'s shortcut behaviour... (of course the end result is the same and for just a short `bannedWords` list, it doesn't matter,but if you scale that to a large, large amount, you're building an unwanted list) – Jon Clements Apr 01 '16 at 17:10
  • @JonClements was not explicitly after the shortcut behaviour. Also, I'm not quite in support of promoting python 2 solutions anymore, unless OP clearly states an intent to use it. Here we don't have such clear intent, as print is used as a function and no tags are present. – Ilja Everilä Apr 01 '16 at 17:11
  • @Ilja The point about using dunder methods still stands - and besides why not write code that behaves exactly the same way across versions and is more readable? :p – Jon Clements Apr 01 '16 at 17:16
  • @JonClements you got me on using a dunder, it's not exactly a best practice. But I do resist avoiding promoting python3 solutions in sake of version support with python2. New code should not be written with python2 in mind, unless absolutely necessary. In my opinion. – Ilja Everilä Apr 01 '16 at 17:20
1

If you want exact matches, use a set of words, call lower on the string and check if the set of bad words is disjoint or not:

banned_set = {"badword1", "badword2"}
if banned_set.isdisjoint(message.lower().split())
   # no bad words

if "foo" was a banned and "foobar" was perfectly valid then using in/__contains__ will wrongly filter the words so you need to carefully decide what way to go.

if banned_set.isdisjoint(message.lower().split()) evaluate to True it is safe to proceed:

In [3]: banned_set = {"badword1", "badword2"}

In [4]: banned_set.isdisjoint("foo bar".split())
Out[4]: True

In [5]: banned_set.isdisjoint("foo bar badword1".split())
Out[5]: False
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • 1
    Good point on the `"foo"` vs. `"foobar"`, missed that altogether. – Ilja Everilä Apr 01 '16 at 17:14
  • @Padraic Cunningham Thanks, that makes a lot more sense to me! – Aaron Apr 01 '16 at 17:24
  • The problem I'm running into using this solution is that it trys to ban anyone who says anything except the words in banned_set – Aaron Apr 01 '16 at 17:33
  • Your old code had the test the other way. Where you were testing if the message actually contains bad words, this solution tests that there are no bad words. – Ilja Everilä Apr 01 '16 at 17:36
  • @Xarotic, llja is correct, if `banned_set.isdisjoint(message.lower().split())` evaluates to True then there are no bad words. – Padraic Cunningham Apr 01 '16 at 17:40
  • 1
    I just realized my faulty logic!, is this a bad way to do it? `if banned_set.isdisjoint(message.lower().split()):` `break` `else: ` `sendMessage(s, "/ban " + user)` – Aaron Apr 01 '16 at 17:40
  • I clearly don't know how to properly comment code.. but I hope you understand what I mean about the if - else – Aaron Apr 01 '16 at 17:43
  • @Xarotic use `if not` - that'll then mean "if there are bad words..." – Jon Clements Apr 01 '16 at 17:45
  • @Xarotic, you can use `if not banned_set.isdisjoint(message.lower().split()): sendMessage(s, "/ban " + user)` and break, the else is not needed. if `not banned_set.isdisjoint(..` evaluates to True then there are bad words. – Padraic Cunningham Apr 01 '16 at 17:45
  • @PadraicCunningham Thank you so much. I'm clearly a bigger noob than I thought! – Aaron Apr 01 '16 at 17:48
  • @Xarotic, no worries, it can be a little confusing . – Padraic Cunningham Apr 01 '16 at 17:48