2

Let's say I have a sequence that goes like this:

seq = (1, 1, 1, 1, 4, 6, 8, 4, 3, 3, 3,)

Some arbitrary number of 1s, followed by some arbitrary number of even numbers, followed by some 3s. If I try to split it up like so:

it = iter(seq)
ones = list(takewhile(lambda x: x == 1, it))
evens = list(takewhile(lambda x: x%2 == 0, it))
threes = list(takewhile(lambda x: x == 3, it))

This almost works out... except I miss the first even number and the first three since it's already used up by takewhile. Is there a way to do this kind of partitioning by just walking the iterator forward, predicate by predicate?

Barry
  • 286,269
  • 29
  • 621
  • 977
  • I think you're going to have this problem with everything in `itertools` because it _has_ to look at the next value to see if it matches the predicate, but that consumes it like you said. If you wrote a custom solution you could write a function that returned a list _and_ the first non-matching value instead of throwing it out, but if you need to stick to generators you may have to get creative or use a class to store state. – Two-Bit Alchemist Aug 29 '16 at 00:35
  • 1
    Have you looked at `itertools.groupby`? – BrenBarn Aug 29 '16 at 00:36
  • @BrenBarn I have multiple, possibly-non-disjoint predicates. I'm not grouping by a key. – Barry Aug 29 '16 at 00:44
  • 1
    You might want to take a look at a neat answer by Martijn, http://stackoverflow.com/questions/30615659/how-not-to-miss-the-next-element-after-itertools-takewhile – Mazdak Aug 29 '16 at 00:54
  • In what way are your grouping criteria non-disjoint? I wrote an answer showing how you can handle this case with `groupby`. – BrenBarn Aug 29 '16 at 01:03
  • @BrenBarn This simple example happens to be non-disjoint. I know I can handle *this* case with `groupby`, but that doesn't solve the general question. Pretend after the `3`s there's another group of `5`s. – Barry Aug 29 '16 at 01:38
  • My answer will also work if there is another group of 5s afterwards. Can you show an example that actually illustrates a case that you think cannot be handled by groupby? – BrenBarn Aug 29 '16 at 01:39
  • I think the question requirements could be better defined, eg: what is the expected result if the `seq` starts with a `5`? – Craig Burgler Aug 29 '16 at 02:03

3 Answers3

1

You could do something like this:

def multi_takewhile(predicates, iterable):
    ipredicates = iter(predicates)
    predicate = next(ipredicates)

    last_chunk = []

    for element in iterable:
        while not predicate(element):
            yield last_chunk

            last_chunk = []

            try:
                predicate = next(ipredicates)
            except StopIteration:
                break

        last_chunk.append(element)

It still has the issue of consuming the last element if you run out of predicates, though. You can modify the function to return the last element in another list or make your own iterable wrapper that keeps track of the last element for you.

Another, more itertools way to do it might be with groupby:

import itertools

class Grouper(object):
    def __init__(self, predicates):
        self.predicates = iter(predicates)
        self.predicate = next(self.predicates)
        self.key = 0

    def __call__(self, element):
        if not self.predicate(element):
            self.key += 1
            self.predicate = next(self.predicates)

        return self.key

def multi_takewhile(predicates, iterable):
    for _, group in itertools.groupby(iterable, Grouper(predicates)):
        yield tuple(group)

seq = [1, 1, 1, 1, 4, 6, 8, 4, 3, 3, 3]
ones, evens, threes = multi_takewhile([(lambda x: x == 1), (lambda x: x%2 == 0), (lambda x: x == 3)], seq)
Blender
  • 289,723
  • 53
  • 439
  • 496
0

groupby will work here for arbitrary key functions, with a carefully-crafted key function:

def f1(x): return x == 1
def f2(x): return x%2 == 0
def f3(x): return x == 3
fs = [f1, f2, f3]

def keyfunc(x): return next((f for f in fs if f(x)), None)

for k, vals in itertools.groupby(data, keyfunc):
    assert k in {f1, f2, f3, None}
    print k, vals

This will obviously sometimes create repeated partitions, for instance in the case [1, 1, 3, 1, 3]

Eric
  • 95,302
  • 53
  • 242
  • 374
  • You don't even need quite that sneaky a key function to handle this case (see my answer). A similar technique might be useful for other kinds of cases though. – BrenBarn Aug 29 '16 at 01:05
  • @BrenBarn: Right, I was going for a general solution here – Eric Aug 29 '16 at 03:45
0

Your example can be handled by groupby:

>>> [list(g) for ix, g in itertools.groupby(seq, lambda x: 0 if x%2==0 else x)]
[[1, 1, 1, 1], [4, 6, 8, 4], [3, 3, 3]]
BrenBarn
  • 242,874
  • 37
  • 412
  • 384