2

I'm aware variations of this question have been asked already, but none of the ones I've been able to find have addressed my specific aim.

I am trying to take two lists in Python with string elements and remove the overlapping sections of the two. For example:

list1 = ["25","+","7","*","6","/","7"]
list2 = ["7","*","6"]

Should go to

["25","+","/","7"]

I've considered a list comprehension along the lines of

[i for i in list1 if not in list2]

but this would yield

["25","+","/"]

as both instances of "7" would be taken out.

How can I achieve what I'm trying to do here? Thanks.

Edit - this was marked as a possible duplicate. In my example with the list comprehension, I already explained how it is a different problem to the one linked.

muke
  • 306
  • 2
  • 11
  • you can try use a counter – apple apple Aug 07 '18 at 02:35
  • is the order important? – apple apple Aug 07 '18 at 02:36
  • 2
    Possible duplicate of [How to remove every occurrence of sub-list from list](https://stackoverflow.com/questions/51518601/how-to-remove-every-occurrence-of-sub-list-from-list) – blhsing Aug 07 '18 at 02:39
  • @blhsing It's not - the question you linked deals with every occurrence while I showed with my example with the list comprehension that that is not what I wanted. – muke Aug 07 '18 at 02:46
  • This is similar to finding a substring in a larger string. I would suggest you to read about KMP (Knuth-Morris-Pratt) algorithm, it can directly be applied to your scenario. – Vikash Kesarwani Aug 07 '18 at 02:56
  • @muke Your list comprehension example has nothing to do with the solutions provided in the link. Please reread the link. It really is the same question you have, which is to remove a sub-list from a larger list. – blhsing Aug 07 '18 at 02:56
  • @muke, can you define overlap for us? See blhsing's comment in answers for how you may get things more clear. – Tai Aug 07 '18 at 03:51

2 Answers2

6

Essentially, you want a difference operation on a multi-set, i.e. a bag. Python implements this for the collections.Counter object:

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

So, for example:

>>> list1 = ["25","+","7","*","6","/","7"]
>>> list2 = ["7","*","6"]
>>> list((Counter(list1) - Counter(list2)).elements())
['25', '+', '7', '/']

In Python 3.6+ this will be ordered (although this is not currently guaranteed, and should probably be considered an implementation detail). If order is important, and you are not using this version, you may have to implement an ordered counter.

Indeed, the docs themselves provide just such a recipe:

>>> from collections import Counter, OrderedDict
>>> class OrderedCounter(Counter, OrderedDict):
...     'Counter that remembers the order elements are first encountered'
...     def __repr__(self):
...         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
...     def __reduce__(self):
...         return self.__class__, (OrderedDict(self),)
...
>>> list((OrderedCounter(list1) - OrderedCounter(list2)).elements())
['25', '+', '/', '7']
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • This solution actually does not maintain order (even with `OrderedCounter`). Try `list1 = ["6","/","7","6","7","6"]` and `list2 = ["7","6","7"]`. The output is `['6', '6', '/']` when it should be `['6', '/', '6']`. – blhsing Aug 07 '18 at 03:21
  • @blhsing I get `['/','6','6']`, i.e. maintaining order of `list1`, however, you are right, what "maintaining order" here is ambiguous. Not sure exactly what OP wants, and they haven't commented on that regard, but I see how "overlap" would imply that. – juanpa.arrivillaga Aug 07 '18 at 03:25
  • My understanding is that the OP wants to emulate string replacement with lists, so it's like `'6/7676'.replace('767', '')`, where the result is `'6/6'`, so to speak, which is why I said the expected output in this case really should be `['6', '/', '6']`. – blhsing Aug 07 '18 at 03:30
  • @blhsing right, I understand what you are saying, but I think that it is ambiguous in that regard. In any case, if the elements will always be strings, then you'd be probably hard-pressed to beat `list(''.join(list1).replace(''.join(list2), ''))` – juanpa.arrivillaga Aug 07 '18 at 03:33
  • Yes, it's slightly ambiguous. But in this case a simple string replacement won't do because in the OP's example there is a string in the list that is more than one character long. – blhsing Aug 07 '18 at 03:39
  • @blhsing ah, yes, I see what you are saying – juanpa.arrivillaga Aug 07 '18 at 03:42
3

Using remove method. Probably slow. O(n^2) in worse case.

list.remove(x)

Remove the first item from the list whose value is x. 
It is an error if there is no such item.
for i in list2:
    list1.remove(i)

# list1 becomes
['25', '+', '/', '7']
Tai
  • 7,684
  • 3
  • 29
  • 49
  • 2
    I believe this will work well, although depending on the lists, it could potentially perform rather poorly. – juanpa.arrivillaga Aug 07 '18 at 02:53
  • @juanpa.arrivillaga agree. Added a comment. – Tai Aug 07 '18 at 02:53
  • 1
    I see this performing well if the pairs are small (better than the dict approach I'd wager). If you are working with two large lists, then the worst-case quadratic time will hit you. – juanpa.arrivillaga Aug 07 '18 at 02:54
  • @juanpa.arrivillaga thanks for letting us know the results of your experiments. – Tai Aug 07 '18 at 03:02
  • This solution actually does not maintain order. Try `list1 = ["6","/","7","6","7","6"]` and `list2 = ["7","6","7"]`. The output is `['/', '6', '6']` when it should be `['6', '/', '6']`. – blhsing Aug 07 '18 at 03:24
  • @blhsing good observation. I think I could misunderstand the problem. – Tai Aug 07 '18 at 03:49