46

I have two lists:

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

I want to remove all sub_list occurrences in big_list.

result should be [2, 3, 4]

For strings you could use this:

'2123124'.replace('12', '')

But AFAIK this does not work for lists.

This is not a duplicate of Removing a sublist from a list since I want to remove all sub-lists from the big-list. In the other question the result should be [5,6,7,1,2,3,4].

Update: For simplicity I took integers in this example. But list items could be arbitrary objects.

Update2:

if big_list = [1, 2, 1, 2, 1] and sub_list = [1, 2, 1], I want the result to be [2, 1] (like '12121'.replace('121', ''))

Update3:

I don't like copy+pasting source code from StackOverflow into my code. That's why I created second question at software-recommendations: https://softwarerecs.stackexchange.com/questions/51273/library-to-remove-every-occurrence-of-sub-list-from-list-python

Update4: if you know a library to make this one method call, please write it as answer, since this is my preferred solution.

The test should pass this test:

def test_remove_sub_list(self):
    self.assertEqual([1, 2, 3], remove_sub_list([1, 2, 3], []))
    self.assertEqual([1, 2, 3], remove_sub_list([1, 2, 3], [4]))
    self.assertEqual([1, 3], remove_sub_list([1, 2, 3], [2]))
    self.assertEqual([1, 2], remove_sub_list([1, 1, 2, 2], [1, 2]))
    self.assertEquals([2, 1], remove_sub_list([1, 2, 1, 2, 1], [1, 2, 1]))
    self.assertEqual([], remove_sub_list([1, 2, 1, 2, 1, 2], [1, 2]))
guettli
  • 25,042
  • 81
  • 346
  • 663
  • 2
    Possible duplicate of [Removing a sublist from a list](https://stackoverflow.com/questions/42881650/removing-a-sublist-from-a-list) – glibdud Jul 25 '18 at 12:16
  • 1
    Well if you have only integers in the list you can go through a conversion to strings: `list(map(int, (''.join(map(str, big_list)).replace(''.join(map(str, sub_list)), ''))))`. Or do you want to apply this to arbitrary objects? – a_guest Jul 25 '18 at 12:22
  • 2
    Just for clarification, if you have `big_list = [1, 2, 1, 2, 1]` and `sub_list = [1, 2, 1]` do you want the result to be `[2, 1]` or `[]` (i.e. remove per occurrence or remove all items that match the `sub_list` pattern)? – a_guest Jul 25 '18 at 13:02
  • @a_guest I updated the question. – guettli Jul 25 '18 at 14:45
  • Just curious, no offend at all, why your rep 3.6k while you have answer with 450+ upvotes.... – Marcus.Aurelianus Jul 25 '18 at 15:27
  • 2
    @Marcus.Aurelianus Probably because most upvotes on that answer occurred on a few days: Stack Overflow has a daily reputation cap at 200. So if more than 20 people upvote your answers within 24h, only the first 20 upvotes (×10 = 200 points) are counted. – Konrad Rudolph Jul 25 '18 at 16:04
  • @Konrad Rudolph, thanks! – Marcus.Aurelianus Jul 25 '18 at 16:28
  • 1
    @Marcus.Aurelianus Konrad's guess is incorrect. The truth is that OP spends most of his reputation on bounties - https://stackoverflow.com/users/633961/guettli?tab=bounties – Leon Jul 30 '18 at 10:55
  • @Leon Oh, very cool. Hats off to OP. I should do this much more, I always forget. – Konrad Rudolph Jul 30 '18 at 11:05
  • @Leon, awesome programmer, hats off to OP. – Marcus.Aurelianus Jul 30 '18 at 11:21
  • 1
    @guettli. Just out of curiosity, why did you add the bounty? Your question has quite a few reasonable answers, so I'm wondering what's missing, or what you expect to find. – Mad Physicist Jul 30 '18 at 13:52
  • @MadPhysicist I gave the bounty, because I would like to have a one-liner. I like libraries. – guettli Jul 30 '18 at 14:32
  • 2
    @guettli. Why not stick a function in your library and use it as a one liner? – Mad Physicist Jul 30 '18 at 15:26
  • @MadPhysicist I like to reuse software. And I still think that a library like numpy has a simple method to solve this question. – guettli Jul 31 '18 at 06:41
  • numpy in particular is a bad example for two reasons. One being that it's not made for this sort of thing, and two that it has lots of examples of python code just line this wrapping a bunch of C functions just so you can have a one liner. – Mad Physicist Jul 31 '18 at 10:51
  • 1
    Another point is that your question is becoming admittedly off topic for SO, bounty not withstanding. – Mad Physicist Jul 31 '18 at 10:52
  • 4
    Finally, the statement "I like to reuse software" is specious at best, given everything else you've say said. There is nothing preventing you from reusing the software besides a completely artificial constraint that you've imposed. – Mad Physicist Jul 31 '18 at 10:55
  • 2
    If there is an external library to solve this problem, would you require an optimized solution written in C, or would you accept something implemented in Python? In the latter case, I could just upload my solution to GitHub with a setup.py file and call it a library. – Mad Physicist Jul 31 '18 at 10:57
  • "I don't like copy+pasting source code from StackOverflow into my code". Entirely a matter of taste. I, on the other hand, really don't mind. In fact, I have a few files of recipes form SO, which were written by the same people that were the libraries I'm using. This includes numpy, matplotlib, python-docx and even python itself. – Mad Physicist Jul 31 '18 at 11:01

13 Answers13

26

You'd have to implement it yourself. Here is the basic idea:

def remove_sublist(lst, sub):
    i = 0
    out = []
    while i < len(lst):
        if lst[i:i+len(sub)] == sub:
            i += len(sub)
        else:
            out.append(lst[i])
            i += 1
    return out

This steps along every element of the original list and adds it to an output list if it isn't a member of the subset. This version is not very efficient, but it works like the string example you provided, in the sense that it creates a new list not containing your subset. It also works for arbitrary element types as long as they support ==. Removing [1,1,1] from [1,1,1,1] will correctly result in [1], as for a string.

Here is an IDEOne link showing off the result of

>>> remove_sublist([1, 'a', int, 3, float, 'a', int, 5], ['a', int])
[1, 3, <class 'float'>, 5]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • 1
    come on, you can do better than this! =) – lenik Jul 25 '18 at 12:22
  • @lenik. Fair enough. Hopefully that's less lazy. I couldn't come up with a more streamlined solution unfortunately. And doing it in-place wasn't going to be any prettier. – Mad Physicist Jul 25 '18 at 12:34
  • no problem, seems like your solution is the only left =) – lenik Jul 25 '18 at 12:35
  • @MadPhysicist, Should we close either this question or [this question](https://stackoverflow.com/questions/42881650/removing-a-sublist-from-a-list) as a duplicate of the other? Your solution is great and different, but possibly these solutions should be in one place. – jpp Jul 25 '18 at 12:47
  • 1
    @jpp. I don't think the questions are equivalent. There are things you can do to get just the first element that wouldn't be possible here, like short circuiting the loop. At the same time, getting all the subsequences is more general than getting just the one. – Mad Physicist Jul 25 '18 at 12:53
  • @MadPhysicist, Good point (and good thing I held off VTC!).. Agreed, this is different. I would then suggest we differentiate the titles. As this can confuse! – jpp Jul 25 '18 at 12:54
  • 1
    @jpp. I did think that before I started writing an answer though. They problems are very closely related, but I couldn't come up with a nice generator that I could just call next on or run to completion since we're removing elements here. – Mad Physicist Jul 25 '18 at 12:55
  • 1
    @jpp. I changed the title of this one. – Mad Physicist Jul 25 '18 at 12:56
  • An edge case was missing :-) I updated the question. – guettli Aug 09 '18 at 07:31
14

Try del and slicing. The worst time complexity is O(N^2).

sub_list=['a', int]
big_list=[1, 'a', int, 3, float, 'a', int, 5]
i=0
while i < len(big_list):
    if big_list[i:i+len(sub_list)]==sub_list:
        del big_list[i:i+len(sub_list)]
    else:
        i+=1

print(big_list)

result:

[1, 3, <class 'float'>, 5]
Marcus.Aurelianus
  • 1,520
  • 10
  • 22
8

A recursive approach:

def remove(lst, sub):
    if not lst:
        return []
    if lst[:len(sub)] == sub:
        return remove(lst[len(sub):], sub)
    return lst[:1] + remove(lst[1:], sub)
print(remove(big_list, sub_list))

This outputs:

[2, 3, 4]
blhsing
  • 91,368
  • 6
  • 71
  • 106
6

A improved version to check whether lst[i:i+len(sub)] < len(lst)

def remove_sublist(lst, sub):
    i = 0
    out = []
    sub_len = len(sub)
    lst_len = len(lst)
    while i < lst_len:
        if (i+sub_len) < lst_len:
            if lst[i: i+sub_len] == sub:
                i += sub_len
            else:
                out.append(lst[i])
                i += 1
        else:
            out.append(lst[i])
            i += 1

    return out
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
mingganz
  • 363
  • 1
  • 7
  • This is not improved in terms of legibility. The short subsequences at the end will never be equal to sub anyway and list is smart enough to check length first for ==. – Mad Physicist Jul 25 '18 at 13:00
  • You're basically making the check harder to read *and* less efficient. – Mad Physicist Jul 25 '18 at 13:01
  • 1
    If list 'sub' is a big one, I think 'if (i+sub_len) < lst_len' should be more efficent than 'if lst[i: i+sub_len] == sub' from the view of perfomance. 'lst[i: i+sub_len]' will need to generate a list and this will cost memory, right? – mingganz Jul 25 '18 at 13:12
  • I suppose so. Too bad lists don't let you get a view instead of a copy of the slice. +1 – Mad Physicist Jul 25 '18 at 14:51
6

How about this:

def remove_sublist(lst, sub):
    max_ind_sub = len(sub) - 1
    out = []
    i = 0
    tmp = []

    for x in lst:
        if x == sub[i]:
            tmp.append(x)
            if i < max_ind_sub: # partial match 
                i += 1
            else:  # found complete match
                i = 0
                tmp = []
        else:
            if tmp:  # failed partial match 
                i = 0
                out += tmp
            if x == sub[0]:  # partial match
                i += 1
                tmp = [x]
            else:
                out.append(x)

    return out

Performance:

lst = [2, 1, 2, 3, 1, 2, 4]
sub = [1, 2]
%timeit remove_sublist(lst, sub)  # solution of Mad Physicist
%timeit remove_sublist_new(lst, sub)
>>> 2.63 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> 1.77 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Update

My first solution had a bug. Was able to fix it (updated my code above) but the method looks way more complicated now. In terms of performance it still does better than the solution from Mad Physicist on my local machine.

RandomDude
  • 1,101
  • 18
  • 33
5

Use itertools.zip_longest to create n element tuples (where n is length of sub_list) and then filter the current element and next n-1 elements when one of the element matched the sub_list

>>> from itertools import zip_longest, islice
>>> itr = zip_longest(*(big_list[i:] for i in range(len(sub_list))))
>>> [sl[0] for sl in itr if not (sl == tuple(sub_list) and next(islice(itr, len(sub_list)-2, len(sub_list)-1)))]
[2, 3, 4]

To improve the efficiency, you can calculate tuple(sub_list) and len(sub_list) before hand you start filtering

>>> l = len(sub_list)-1
>>> tup = tuple(sub_list)
>>> [sl[0] for sl in itr if not (sl == tup and next(islice(itr, l-1, l)))]
[2, 3, 4]
Sunitha
  • 11,777
  • 2
  • 20
  • 23
5

Update: The more_itertools library has released more_itertool.replace, a tool that solves this particular problem (see Option 3).

First, here are some other options that work on generic iterables (lists, strings, iterators, etc.):

Code

Option 1 - without libraries:

def remove(iterable, subsequence):
    """Yield non-subsequence items; sans libraries."""
    seq = tuple(iterable)
    subsequence = tuple(subsequence)
    n = len(subsequence)
    skip = 0

    for i, x in enumerate(seq):
        slice_ = seq[i:i+n]
        if not skip and (slice_ == subsequence):
            skip = n
        if skip:
            skip -= 1
            continue
        yield x   

Option 2 - with more_itertools

import more_itertools as mit


def remove(iterable, subsequence):
    """Yield non-subsequence items."""
    iterable = tuple(iterable)
    subsequence = tuple(subsequence)
    n = len(subsequence)
    indices = set(mit.locate(mit.windowed(iterable, n), pred=lambda x: x == subsequence))

    it_ = enumerate(iterable)
    for i, x in it_:
        if i in indices:
            mit.consume(it_, n-1)
        else:
            yield x

Demo

list(remove(big_list, sub_list))
# [2, 3, 4]

list(remove([1, 2, 1, 2], sub_list))
# []

list(remove([1, "a", int, 3, float, "a", int, 5], ["a", int]))
# [1, 3, float, 5]

list(remove("11111", "111"))
# ['1', '1']

list(remove(iter("11111"), iter("111")))
# ['1', '1']

Option 3 - with more_itertools.replace:

Demo

pred = lambda *args: args == tuple(sub_list)
list(mit.replace(big_list, pred=pred, substitutes=[], window_size=2))
# [2, 3, 4]

pred=lambda *args: args == tuple(sub_list)
list(mit.replace([1, 2, 1, 2], pred=pred, substitutes=[], window_size=2))
# []

pred=lambda *args: args == tuple(["a", int])
list(mit.replace([1, "a", int, 3, float, "a", int, 5], pred=pred, substitutes=[], window_size=2))
# [1, 3, float, 5]

pred=lambda *args: args == tuple("111")
list(mit.replace("11111", pred=pred, substitutes=[], window_size=3))
# ['1', '1']

pred=lambda *args: args == tuple(iter("111"))
list(mit.replace(iter("11111"), pred=pred, substitutes=[], window_size=3))
# ['1', '1']

Details

In all of these examples, we are scanning the main sequence with smaller window slices. We yield whatever is not found in the slice and skip whatever is in the slice.

Option 1 - without libraries

Iterate an enumerated sequence and evaluate slices of size n (the length of the sub-sequence). If the upcoming slice equals the sub-sequence, reset skip and yield the item. Otherwise, iterate past it. skip tracks how many times to advance the loop, e.g. sublist is of size n=2, so it skips twice per match.

Note, you can convert this option to work with sequences alone by removing the first two tuple assignments and replacing the iterable parameter with seq, e.g. def remove(seq, subsequence):.

Option 2 - with more_itertools

Indices are located for every matching sub-sequence in an iterable. While iterating an enumerated iterator, if an index is found in indices, the remaining sub-sequence is skipped by consuming the next n-1 elements from the iterator. Otherwise, an item is yielded.

Install this library via > pip install more_itertools.

Option 3 - with more_itertools.replace:

This tool replaces a sub-sequence of items defined in a predicate with substitute values. To remove items, we substitute an empty container, e.g. substitutes=[]. The length of replaced items is specified by the window_size parameter (this value is equal to the length of the sub-sequence).

pylang
  • 40,867
  • 14
  • 129
  • 121
  • I'm really not a fan of the `tuple(iterable)` here. You're losing all the benefits of lazy evaluation just to get slicing. If you're going to write a generator solution, you should try to hold it to the same standards as the `itertools` generators, so you can hook them together into a lazily evaluated pipeline. – Patrick Haugh Aug 01 '18 at 13:37
  • No I was referring to the other two. `iterable = tuple(iterable)` and `seq = tuple(iterable)` – Patrick Haugh Aug 01 '18 at 17:08
  • Ah I see. We cast to tuples in order to use native slicing. The other option could be to use `itertools.islice`, but of course that requires a library. In Option 1, I mention you can remove the first two lines that cast to tuples. Then they would work on sequences only. Perhaps i'll add an option with `itertools`. Thanks. – pylang Aug 01 '18 at 17:15
  • @PatrickHaugh I appreciate the feedback. You mentioned "...try to hold it to the same standards as the `itertools` generators". However, `tuple(iterable)` is not uncommon in `itertools`, i.e. the combinatorics examples in the [docs](https://docs.python.org/3/library/itertools.html) and [recipes](https://docs.python.org/3/library/itertools.html#itertools-recipes). Which standards do you mean? – pylang Aug 01 '18 at 17:40
4

More readable than any above and with no additional memory footprint:

def remove_sublist(sublist, mainlist):

    cursor = 0

    for b in mainlist:
        if cursor == len(sublist):
            cursor = 0
        if b == sublist[cursor]:
            cursor += 1
        else:
            cursor = 0
            yield b

    for i in range(0, cursor):
        yield sublist[i]

This is for onliner if you wanted a function from library, let it be this

[x for x in remove_sublist([1, 2], [2, 1, 2, 3, 1, 2, 4])]
Dmitry Dyachkov
  • 1,715
  • 2
  • 19
  • 46
3

Kinda different approach in Python 2.x!

from more_itertools import locate, windowed
big_list = [1, 2, 1, 2, 1]
sub_list = [1, 2, 1]

"""
Fetching all starting point of indexes (of sub_list in big_list)
to be removed from big_list. 
"""

i = list(locate(windowed(big_list, len(sub_list)), pred=lambda x: x==tuple(sub_list)))

""" 
Here i comes out to be [0, 2] in above case. But index from 2 which 
includes 1, 2, 1 has last 1 from the 1st half of 1, 2, 1 so further code is
to handle this case.
PS: this won't come for-
big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]
as here i comes out to be [1, 4]
"""

# The further code.
to_pop = []
for ele in i:
    if to_pop:
        if ele == to_pop[-1]:
            continue
    to_pop.extend(range(ele, ele+len(sub_list)))

# Voila! to_pop consists of all the indexes to be removed from big_list.

# Wiping out the elements!
for index in sorted(to_pop, reverse=True):
    del big_list[index]

Note that you need to delete them in reverse order so that you don't throw off the subsequent indexes.

In Python3, signature of locate() will differ.

user5319825
  • 491
  • 1
  • 6
  • 16
1

(For final approach, see last code snippet)

I'd have thought a simple string conversion would be sufficient:

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

new_list = list(map(int, list((''.join(map(str, big_list))).replace((''.join(map(str, sub_list))), ''))))

I'm essentially doing a find/replace with the string equivalents of the lists. I'm mapping them to integers afterwards so that the original types of the variables are retained. This will work for any size of the big and sub lists.

However, it's likely that this won't work if you're calling it on arbitrary objects if they don't have a textual representation. Moreover, this method results in only the textual version of the objects being retained; this is a problem if the original data types need to be maintained.

For this, I've composed a solution with a different approach:

new_list = []
i = 0
while new_list != big_list:
    if big_list[i:i+len(sub_list)] == sub_list:
        del big_list[i:i+len(sub_list)]
    else:
        new_list.append(big_list[i])
        i += 1

Essentially, I'm removing every duplicate of the sub_list when I find them and am appending to the new_list when I find an element which isn't part of a duplicate. When the new_list and big_list are equal, all of the duplicates have been found, which is when I stop. I haven't used a try-except as I don't think there should be any indexing errors.

This is similar to @MadPhysicist's answer and is of roughly the same efficiency, but mine consumes less memory.

This second approach will work for any type of object with any size of lists and therefore is much more flexible than the first approach. However, the first approach is quicker if your lists are just integers.

However, I'm not done yet! I've concocted a one-liner list comprehension which has the same functionality as the second approach!

import itertools
new_list = [big_list[j] for j in range(len(big_list)) if j not in list(itertools.chain.from_iterable([ list(range(i, i+len(sub_list))) for i in [i for i, x in enumerate(big_list) if x == sub_list[0]] if big_list[i:i+len(sub_list)] == sub_list ]))]

Initially, this seems daunting, but I assure you it's quite simple! First, i create a list of the indices where the first element of the sublist has occured. Next, for each of these indices, I check if the following elements form the sublist. If they do, the range of indices which form the duplicate of the sublist is added to another list. Afterwards, I use a function from itertools to flatten the resulting list of lists. Every element in this flattened list is an index which is in a duplicate of the sublist. Finally, I create a new_list which consists of every element of the big_list which has an index not found in the flattened list.

I don't think that this method is in any of the other answers. I like it the most as it's quite neat once you realise how it works and is very efficient (due to the nature of list comprehensions).

Adi219
  • 4,712
  • 2
  • 20
  • 43
0

You can use recursion with a generator:

def remove(d, sub_list):
   if d[:len(sub_list)] == sub_list and len(sub_list) <= len(d[:len(sub_list)]):
      yield from [[], remove(d[len(sub_list):], sub_list)][bool(d[len(sub_list):])]
   else:
      yield d[0]
      yield from [[], remove(d[1:], sub_list)][bool(d[1:])]

tests = [[[2, 1, 2, 3, 1, 2, 4], [1, 2]], [[1, 2, 1, 2], [1, 2]], [[1, 'a', int, 3, float, 'a', int, 5], ['a', int]], [[1, 1, 1, 1, 1], [1,1,1]]]
for a, b in tests:
  print(list(remove(a, b)))

Output:

[2, 3, 4]
[]
[1, 3, <class 'float'>, 5]
[1, 1]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

Just for fun, here is the closest approximation to a one-liner:

from functools import reduce

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]
result = reduce(lambda r, x: r[:1]+([1]+r[2:-r[1]],[min(len(r[0]),r[1]+1)]+r[2:])[r[-r[1]:]!=r[0]]+[x], big_list+[0], [sub_list, 1])[2:-1]

Don't trust that it works? Check it on IDEone!

Of course it's far from efficient and is disgustfully cryptic, however it should help to convince the OP to accept @Mad Physicist's answer.

Leon
  • 31,443
  • 4
  • 72
  • 97
  • "Of course it's far from efficient and is disgustfully cryptic, however it should help to convince the OP to accept @Mad Physicist's answer." +1 for that – Chris_Rands Aug 04 '18 at 12:23
0

What you are trying to achieve can be done by converting it into list of strings and after replacing again convert it to integer type.

In a single line you can do it like this

map(int,list(("".join(map(str, big_list))).replace("".join(map(str, sub_list)),'').replace(''.join((map(str, sub_list))[::-1]),'')))

Input

big_list = [1, 2, 1, 2, 1]
sub_list = [1, 2, 1]

Output

[2, 1]

Input

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

Ouput

[2, 3, 4]

HimanshuGahlot
  • 561
  • 4
  • 11
  • 1
    According to the question this should work for any type: objects, classes, ... AFAIK you can't make very object to a string. You could pickle them to a string .... but this sounds wired. – guettli Aug 01 '18 at 07:25