32

How do I remove an element from a list if it matches a substring?

I have tried removing an element from a list using the pop() and enumerate method but seems like I'm missing a few contiguous items that needs to be removed:

sents = ['@$\tthis sentences needs to be removed', 'this doesnt',
     '@$\tthis sentences also needs to be removed',
     '@$\tthis sentences must be removed', 'this shouldnt',
     '# this needs to be removed', 'this isnt',
     '# this must', 'this musnt']

for i, j in enumerate(sents):
  if j[0:3] == "@$\t":
    sents.pop(i)
    continue
  if j[0] == "#":
    sents.pop(i)

for i in sents:
  print i

Output:

this doesnt
@$  this sentences must be removed
this shouldnt
this isnt
#this should
this musnt

Desired output:

this doesnt
this shouldnt
this isnt
this musnt
martineau
  • 119,623
  • 25
  • 170
  • 301
alvas
  • 115,346
  • 109
  • 446
  • 738
  • 3
    Classic case of removing items from a list while you're iterating over that list. Read the dozens of other Stack Overflow questions that relate to this. Also, see the [note in the docs](http://docs.python.org/reference/compound_stmts.html#for). – John Y Oct 01 '12 at 02:47
  • 1
    you should always avoid changing the length of a container while iterating through it, this is a recipe for disaster – wim Oct 01 '12 at 02:48
  • In general, it's usually better to create a new filtered list than to try to modify a list in-place. Immutable algorithms are always easier to reason through (although not always easier to figure out how to write). When you're just replacing values, sometimes the efficiency gains of working in-place beat that, but when you're deleting or inserting in the middle of a list, you're usually getting _worse_ efficiency along with your less robust logic. – abarnert Oct 01 '12 at 03:36

3 Answers3

45

How about something simple like:

>>> [x for x in sents if not x.startswith('@$\t') and not x.startswith('#')]
['this doesnt', 'this shouldnt', 'this isnt', 'this musnt']
D.Shawley
  • 58,213
  • 10
  • 98
  • 113
19

This should work:

[i for i in sents if not ('@$\t' in i or '#' in i)]

If you want only things that begin with those specified sentential use the str.startswith(stringOfInterest) method

[i for i in sents if i.startswith('#')]
JayRizzo
  • 3,234
  • 3
  • 33
  • 49
mjgpy3
  • 8,597
  • 5
  • 30
  • 51
  • 4
    I'd argue this one is better than the other two for not assuming the substrings are at the start – Frikster Jul 21 '15 at 18:04
  • FYI -- if you get a `NoneType` Error, when using this CHECK YOUR VALUES and be sure to remove any `` values in your list. see: https://www.geeksforgeeks.org/python-remove-none-values-from-list/ – JayRizzo Jun 08 '22 at 19:42
14

Another technique using filter

filter( lambda s: not (s[0:3]=="@$\t" or s[0]=="#"), sents)

The problem with your orignal approach is when you're on list item i and determine it should be deleted, you remove it from the list, which slides the i+1 item into the i position. The next iteration of the loop you're at index i+1 but the item is actually i+2.

Make sense?

cod3monk3y
  • 9,508
  • 6
  • 39
  • 54