22

I have a function in python whose output is a generator :

def main_function(x):
    r = get_range()
    for i in range(r):
        yield x+i

I want to refactor the code (I've simplified the use case but actual computation might be complex & longer. Please see EDIT below). Based on my understanding, this is what I should do to keep the functionality unchanged :

(a) Same interface as original code

    def sub_function(x,r):
        for i in range(r):
            yield x+i    

    def main_function(x):
        r = get_range()
        return sub_function(x,r)

As compared to other approaches where :

(b) This would return a generator of a generator (Are there any advantages of this approach ?)

    def sub_function(x,r):
        for i in range(r):
            yield x+i    

    def main_function(x):
        r = get_range()
        yield sub_function(x,r)

(c) This would defeat the purpose of a generator (Is that correct?)

    def sub_function(x,r):
        return [x+i for i in range(r)]

    def main_function(x):
        r = get_range()
        for i in sub_function(x,r):
            yield(i)

EDIT : Comments point out that the right answer is use case dependent. I want to add that my use case is parsing an XML file to extract fields and write them to a database. This part is delegated to sub_function(). I also asked this question for a general understanding of the usage of nested yield for refactoring code.

Soroush
  • 989
  • 2
  • 10
  • 16
user
  • 17,781
  • 20
  • 98
  • 124
  • Unfortunately, the practical answer is going to depend on the differences between this simplification and your actual use case. If the real `main_function` needs to do anything significant, you might be forced to modify the generated sequence... – Karl Knechtel Jun 28 '11 at 07:37
  • Without a concrete use case it is impossible to give a definitive answer. Use whichever fits your current needs the best. – Duncan Jun 28 '11 at 07:39

1 Answers1

13

You're right; the initial example and a) do the same thing since both return a generator.

b) is different: It returns a generator which yields a single element (which is another generator). To use that, you need two loops (one over the outer and one over the inner generator).

There are no advantages per se but sometimes, it can be useful to build nested generators.

c) could be worse but I'm pretty sure that the [x for x in y] is actually implemented as a generator, too. So it's a bit more expensive but not that much.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • have you profiled generators against list comprehensions for this simple case? – Paulo Scardine Jun 28 '11 at 07:47
  • 5
    At least for 2.x: `[x for x in y]` creates a list. `(x for x in y)` creates a generator (and also does not clobber/create `x` in `locals()`). – Karl Knechtel Jun 28 '11 at 07:55
  • 3
    i did once and compared `list( (x for x in l) )` vs. `[x for x in l]` and found that in python 2 the first (1.96 msec) was slower than the second (1.44 msec) (for a list of 10000 items, using `python -mtimeit "l=range(10000)" "list( (x for x in l) )"` and `python -mtimeit "l=range(10000)" "[x for x in l]"`) but in python 3 the second should be equal to the first as they removed the cheat that list comprehensions used in python 2 and turned the second into the first. – Dan D. Jun 28 '11 at 07:59
  • 2
    I know this answer is over 11 years old. But, at least in Python 3 today, shouldn't alternative "a" use `yield from` rather than `return` to be equivalent? (Or alternatively `for i in sub_function(x, r): yield i`.) – Arjan Dec 04 '22 at 12:34
  • @Arjan I don't know. I've never used `yield from` and didn't have much time with Python 3, yet. – Aaron Digulla Dec 12 '22 at 10:09