how to filter out words in python?

Question

For example:

item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat']

I want to be able to filter out the dog and cat so it would read:

item=['the  is gone', 'the   and  is gone']

item1=[] 
for w in words:
   for line in item:
      if w in line:
         j=gg.replace(it,'')
         item1.append(j)

I get the following:

['the  is gone', 'the cat and  is gone', 'the  and dog is gone']

Jesse the Game · Accepted Answer · 2012-12-01T07:05:49.077

5

You're looping over all lines for each word and appending the replaces. You should switch those loops:

item1 = [] 
for line in item:
    for w in words:
        line = line.replace(w, '')
    item1.append(line)

Note: I altered some code

changed gg to line
changed it to item
removed the check if line contains w as that is handled by replace

replace does not know about word boundries. If you want to remove entire words only, you should try a different approach. Using re.sub

import re

item1 = [] 
for line in item:
    for w in words:
        line = re.sub(r'\b%s\b' % w, '', line)  # '\b' is a word boundry
    item1.append(line)

edited Dec 01 '12 at 07:05

answered Dec 01 '12 at 06:16

Jesse the Game

2,600
16
21

when do this other other words it takes apart words ie. good -> od but go==good gives false comparison. – user1753878 Dec 01 '12 at 06:50
@user1753878 updated the answer with replacing of full words only – Jesse the Game Dec 01 '12 at 07:06

score 2 · Answer 2 · answered Jul 08 '14 at 08:13

2

You might use this approach instead:

item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat'] 

item2 = [" ".join([w for w in t.split() if not w in words]) for t in item]

print item2

>>> ['the is gone', 'the and is gone']

answered Jul 08 '14 at 08:13

Radek

1,530
16
20

This is significantly faster! micro seconds vs milli seconds for my use cases – ic_fl2 Jan 21 '19 at 13:50

how to filter out words in python?

2 Answers2