9

UPDATE: example now lists desired results (boldfaced below)

I find myself writing lots of functions that search through some data, where I want to let the caller specify behaviours when matches are found: they might print something out or add it to one of their data structures, but it's also highly desirable to be able to optionally return found data for further transmission, storage or processing.

Example

def find_stuff(visitor):    # library search function
    for x in (1, 2, 3, 4, 5, 6):
        visitor(x)

First client usage:

def my_visitor(x):   # client visitor functions (also often use lambdas)
    if x > 3:
        yield x / 2   #>>> WANT TO DO SOMETHING LIKE THIS <<<#

results = find_stuff(my_visitor)   # client usage

results should yield 4/2, 5/2, then 6/2... i.e. 2, 2, 3.

Second client usage:

def print_repr_visitor(x):
    print repr(x)

find_stuff(print_repr_visitor)     # alternative usage

should print 1 2 3 4 5 6 (separate lines) but yield nothing

But, the yield doesn't create a generator in "results" (at least with python 2.6.6 which I'm stuck with).


What I've tried

I've been hacking this up, often like this...

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        val = visitor(x)
        if val is not None:
             yield val

...or sometimes, when the list of visitor parameters is a pain to type out too many times...

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        val = visitor(x)
        if val == 'yield':
            yield x
        elif val is not None:
             yield val

The Issues / Question

These "solutions" are not only clumsy - needing explicit built-in support from the "find" routine - they remove sentinel values from the set of results the visitor can yield back to the top-level caller...

    Are there better alternatives in terms of concision, intuitiveness, flexibility, elegance etc?

Thanks!

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • 2
    I don't really understand what you're asking for. Can you give an example of how you want to use the function(s) and what you want the result to be? I guess I don't get why you don't just always yield the result of `visitor(x)` in your `find_stuff` function. – BrenBarn Jan 22 '14 at 07:31
  • You can use `object()` to create a unique sentinel value, if that's the problem. A generator function returns a generator object not `None`, So ,I am not sure what's the point of `if val is not None` there. – Ashwini Chaudhary Jan 22 '14 at 07:36
  • yeah, more info would be helpful... i don't see any way around what you posted, given your constraints. I read this as you want to iterate over an generator/iterable, where some calls change state but don't return a value, others return a value, and sometimes this is the lookup element, and sometimes it's post-processed data? Seems like any more elegant solution is going to be pretty domain-specific... One way to simplify is, if `my_visitor` wants to return the lookup value (i.e. the `if val == 'yield'` term), instead of returning a sentinel, it should just return `x`. – Corley Brigman Jan 22 '14 at 07:39
  • @BrenBarn: I've added an example / results. The reason I don't always yield is that the visitor might not return any results - per the Q "might print something out or...". user2357112 says below that yield will treat no visitor return value like an empty iterable, but I'm not seeing that in my python 2.6.6 test case - will double check. – Tony Delroy Jan 22 '14 at 08:16
  • @AshwiniChaudhary: if the visitor is filtering the values to be yielded to the caller of `find_stuff`, then it needs some way to instruct `find_stuff` not to yield anything - not even `None`: that's why there's a sentinel. `object()` sounds like a better sentinel - thanks for the suggestion! – Tony Delroy Jan 22 '14 at 08:19
  • @CorleyBrigman: yes, it does feel like I'm forced to be overly "domain-specific" each place I do this. In various places I've had sentinels for returning the original values, sometimes with various processing, vs. the visitor's return value or yielding nothing at all. That's why I'm trying to find a way to give the visitor the freedom to choose from all these weird behaviours rather than hardcode support for a list of them and coordinate with the sentinel. Insisting the visitor return `(yield_yes_no, what_to_yield)` is sufficiently flexible but mildly painful to use. – Tony Delroy Jan 22 '14 at 08:22
  • I finally found something that's (close to?) a duplicate - http://stackoverflow.com/questions/3074784/invoking-yield-for-a-generator-in-another-function - so I'll vote to close my own question, but not delete it (yet?)...! Cheers all. – Tony Delroy Jan 22 '14 at 08:33

4 Answers4

7

In Python 3, you can use yield from to yield items from a subgenerator:

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        yield from visitor(x)

In Python 2, you have to loop over the subgenerator. This takes more code and doesn't handle a few edge cases, but it's usually good enough:

def find_stuff(visitor):
    for x in (1, 2, 3, 4, 5):
        for item in visitor(x):
            yield item

The edge cases are things like trying to send values or throw exceptions into the subgenerator. If you're not using coroutine functionality, you probably don't need to worry about them.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • +1 Could you please explain the edge cases, with examples, if possible? – thefourtheye Jan 22 '14 at 07:35
  • 2
    This is true, but it's not clear to me that this is really what the question is asking for. – BrenBarn Jan 22 '14 at 07:36
  • it looks like this is supposed to be sort of a 'filter with side effects'. – Corley Brigman Jan 22 '14 at 07:41
  • @BrenBarn: It seems to handle everything yours does and more with no modification to the `visitor` code. In your version, the visitor needs to return a signal value indicating whether to `yield` anything. With a `for` loop around the visitor, the visitor can just `yield` the value or not. Can you explain what you think the question is asking for? – user2357112 Jan 22 '14 at 07:57
  • I was missing that this will correctly yield nothing if `visitor` just returns without yielding, so perhaps it's better than I thought. The use of "tiny" generators in the question, though --- yielding only one value or not yielding anything --- seems a bit perverse to me. If the visitor is just manipulating a single potential "search result", I feel like it should just be returning and not be a generator at all. We'll see if the OP clarifies his intent. – BrenBarn Jan 22 '14 at 08:03
  • @user2357112: I'm stuck with the same issue that confused BrenBarn: when I call `list(find_stuff(a4))` function with visitor `def a4(x): if x > 4: yield x`, I get `TypeError: 'list' object is not callable`. Will see if I can get anything working.... – Tony Delroy Jan 22 '14 at 08:26
  • @TonyD: You most likely named a variable `list`, shadowing the built-in. Rename it. – user2357112 Jan 22 '14 at 08:28
  • @user2357112: Oh yikes - so I did. Thanks again. This is relatively painless then - even arguable better than being fully implicit for the documentation value re. `find_stuff` potentially `yield`ing visitor return values. I never imagined a visitor that didn't return anything could be iterated over..... – Tony Delroy Jan 22 '14 at 08:37
  • @TonyD: Actually, that might be an issue. While a generator that never hits a `yield` can still be iterated over (and must be iterated over to run its code), if the visitor doesn't contain a `yield` at all, it won't be a generator and needs to be handled differently. You'll either need to require that all visitors be generators even if they never yield anything, or you'll need to have a different API for non-generator visitors. – user2357112 Jan 22 '14 at 09:07
  • @user2357112: ah, I see. TBH, I'd also found that in practice I often have something like `if visitor_return_value == 'break': break` too, or to give a choice between yielding "raw" and "cooked" values without the complexity of cooking falling on the visitor. Once something's forced me to store the visitor return value into a var for testing instead of unconditionally yielding it, I was back where I started anyway.... – Tony Delroy Jan 22 '14 at 09:16
2

If understand right, perhaps you want something like this:

def find_stuff(visitor):
    for x in [1, 2, 3, 4, 5]:
        match, val = visitor(x)
        if match:
            yield val

def my_visitor(x):
    if x > 4:
        return True, x/2
    else:
        return False, None

That is, have the visitor return two things: the value to be yielded, if any, and a boolean indicating whether to yield the value. This way any value can be yielded.

The title of your question seems to suggest that you want my_visitor to somehow decide whether or not find_stuff yields a value on each iteration, but you don't actually describe this in the question. In any case, it isn't possible. A generator can call another function to decide what to yield, but there's no way for the called function to magically make its caller yield or not yield; that decision has to be made within the caller (find_stuff in this case).

From your question, though, I don't understand why this is a problem. You say that your proposed solutions are "clumsy - needing explicit built-in support from the "find" routine" but I don't see how that's clumsy. It's just an API. find_stuff obviously will have to have "built-in support" for doing what it's supposed to do, and the visitors will have to know what to return to communicate with the caller. You can't expect to be able to write a my_visitor function that works with any find routine anyone might come up with; the system as a whole will have to define an API that describes how to write a visitor that find_stuff can use. So you just need to come up with an API that visitors have to follow. My example above is one simple API, but it's hard to tell from your question what you're looking for.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • "somehow decide whether or not find_stuff yields a value on each iteration, but you don't actually describe this in the question." - that's the crux of it, yes. "In any case, it isn't possible" - and that's the rub, apparently. Re "why this is a problem" and "define an API" - I just habitually try to keep visitor functionality orthogonal from the algo calling it, which makes both more reusable and flexible, but I guess I'm reaching too far. Thanks for the insights. – Tony Delroy Jan 22 '14 at 08:32
2

I did find a solution for this with some investigation, and in python 2.6. It's a little weird, but it does appear to work.

from itertools import chain

def my_visitor(x):
    if x > 3:
        yield x / 2

def find_stuff(visitor):
    search_list = (1,2,3,4,5,6)
    return (x for x in chain.from_iterable(visitor(x) for x in search_list))

find_stuff(my_visitor)
<generator object <genexpr> at 0x0000000047825558>

list(find_stuff(my_visitor))
[0x2, 0x2, 0x3]

as expected. The generator is nice, as you can do things like this:

def my_visitor2(x):
    if x > 3:
        yield x / 2
    elif x > 1:
        yield x
        yield x*2
        yield x-3

In [83]: list(find_stuff(my_visitor2))
[0x2, 0x4, -0x1, 0x3, 0x6, 0x0, 0x2, 0x2, 0x3]

and have each visit return no values, a single values, or a bunch of values, and they'll all get into the result.

You could adapt this to scalar values though as well. Best way would be with a nested generator:

sentinel = object()

def my_scalar_visitor(x):
    if x > 3: 
        return x / 2
    else:
        return sentinel

def find_stuff_scalar(scalar_visitor):
    search_list=(1,2,3,4,5,6)
    return (x for x in (scalar_visitor(y) for y in search_list) if x != sentinel)

list(find_stuff_scalar(my_scalar_visitor))
[0x2, 0x2, 0x3]
Corley Brigman
  • 11,633
  • 5
  • 33
  • 40
1

user2357112's answer solves the problem given by the question, but it seems to me that the generator-within-a-generator approach is overcomplicated for this specific situation, and limits the client's options for using your code.

You want to traverse some structure, apply some function, and yield the results. Your code allows for this, but you are conflating two ideas that Python already has excellent, separate support for (traversing and mapping) with no extra benefits.

Your traversal function could simply traverse:

def traverse_stuff():
    for x in (1, 2, 3, 4, 5, 6):
        yield x

And when we want to consume, you or your client can use list comprehensions, combinators such as map and filter, or just simple for loops:

[x / 2 for x in traverse_stuff() if x > 3]

map(lambda x: x / 2, filter(lambda x: x > 3, traverse_stuff())

for value in traverse_stuff():
    print(value)

Splitting the code in this way makes it more composable (your client is not limited to the visitor pattern/generators), more intuitive for other Python developers, and more performant for cases where you only need to consume part of the structure (e.g., when you only need to find some n number of nodes from a tree, when you only want to find the first value in your structure that satisfies a condition, &c.).

Community
  • 1
  • 1
gntskn
  • 394
  • 3
  • 10
  • Solid points, though a large part of the appeal of a visitor pattern is the simplicity and inflexibility itself: when it suits an application's needs it's very self-documenting. I also like the option of having the processing (with any filtering) localised in one functor, rather than separate in the source as it is in the comprehension and `map`/`filter` solutions you've suggested. Still, your perspective highlights the options and in python 2 when a visitor implementation is getting knotty, your alternatives are massively better than letting the code get twisted as in my question! Thanks! – Tony Delroy Dec 20 '16 at 00:39