11

I have an object subclass which implements a dynamic dispatch __ iter __ using a caching generator (I also have a method for invalidating the iter cache) like so:

def __iter__(self):
    print("iter called")
    if self.__iter_cache is None:
        iter_seen = {}
        iter_cache = []
        for name in self.__slots:
            value = self.__slots[name]
            iter_seen[name] = True
            item = (name, value)
            iter_cache.append(item)
            yield item           
        for d in self.__dc_list:
            for name, value in iter(d):
                if name not in iter_seen:
                    iter_seen[name] = True
                    item = (name, value)
                    iter_cache.append(item)
                    yield item
        self.__iter_cache = iter_cache
    else:
        print("iter cache hit")
        for item in self.__iter_cache:
            yield item

It seems to be working... Are there any gotchas I may not be aware of? Am I doing something ridiculous?

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
Emanuel Landeholm
  • 1,396
  • 1
  • 15
  • 21
  • 2
    I would at least use a [`set`](http://docs.python.org/library/stdtypes.html#set) instead of a `dict` for the `iter_seen` structure. – Jonas Schäfer Jul 05 '12 at 15:11
  • Hm, what would that gain me really? Since I don't need set algebra, wouldn't dict be a more reasonable and lightweight implementation? – Emanuel Landeholm Jul 05 '12 at 15:20
  • 2
    replace `for _ in iter(whatever)` with `for _ in whatever`. You never need `iter` inside `for` statement – jfs Jul 05 '12 at 15:30
  • 1
    you could use self.__slots.viewitems() to get both name and value if it supports it – jfs Jul 05 '12 at 15:37
  • 2
    the performance set vs. dict is almost the same but semantically a set is a better fit here: `seen = set(); ... if name not in seen: ... seen.add(name);` – jfs Jul 05 '12 at 15:42
  • I have always used {} for this pattern, but your arguments are persuasive... – Emanuel Landeholm Jul 05 '12 at 16:09
  • 1
    @EmanuelLandeholm: set is more lightweight than dict, that's why it was added. The set methods don't add weight. The `True` values in your dict do. – bukzor Jul 05 '12 at 16:23
  • 1
    `for x in iter(y)` is redundant. The for statement will make an iterator as needed. `for x in y` is idomatic python. – bukzor Jul 05 '12 at 16:38

4 Answers4

7

container.__iter__() returns an iterator object. The iterator objects themselves are required to support the two following methods, which together form the iterator protocol:

iterator.__iter__()

Returns the iterator object itself.

iterator.next()

Return the next item from the container.

That's exactly what every generator has, so don't be afraid of any side-effects.

dor00012
  • 362
  • 4
  • 17
Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77
  • 5
    Making an container object's `__iter__()` method a generator by using one or more `yield` statements is a common shortcut which avoids having to explicitly define and code a separate iterator class and its methods. – martineau Jul 05 '12 at 16:22
3

It seems like a very fragile approach. It is enough to change any of __slots, __dc_list, __iter_cache during active iteration to put the object into an inconsistent state.

You need either to forbid changing the object during iteration or generate all cache items at once and return a copy of the list.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • True. __slots is only changed by __setitem__ or __delitem__, I could easily forbid those ops (raise exc) when a generator is active. __dc_list is currently only set/changed in __init__, if I add a method to update it (I likely will) I need to copy the forbid semantics from __slots. __iter_cache is not a problem. It is only ever updated by __iter__, and only after the whole sequence has been enumerated. – Emanuel Landeholm Jul 05 '12 at 15:39
  • 1
    multiple concurrent iterations are similar to multithreading in some aspects. It much easier to reason about it if the objects are immutable. Imagine three iterations: the first populates the cache, the third uses the cache, the second starts some time before the cache is set but after the object is changed (it might see newer values then the 3rd iteration that was started after it) – jfs Jul 05 '12 at 15:55
  • 1
    I love this site. Your quick and to-the-point answers have dramatically increased my Py knowledge, thanks guys! – Emanuel Landeholm Jul 05 '12 at 16:16
2

It might be better to separate the iteration of the object from the caching of the values it returns. That would simplify the iteration process and allow you to easily control how the caching is accomplished as well as whether it is enabled or not, for example.

Another possibly important consideration is the fact that your code would not predictively handle the situation where the object being iterated over gets changed between successive calls to the method. One simple way to deal with that would be to populate the cache's contents completely on the first call, and then just yield what it contains for each call -- and document the behavior.

martineau
  • 119,623
  • 25
  • 170
  • 301
0

What you're doing is valid albeit weird. What is a __slots or a __dc_list ?? Generally it's better to describe the contents of your object in an attribute name, rather than its type (eg: self.users rather than self.u_list).

You can use my LazyProperty decorator to simplify this substantially.

Just decorate your method with @LazyProperty. It will be called the first time, and the decorator will then replace the attribute with the results. The only requirement is that the value is repeatable; it doesn't depend on mutable state. You also have that requirement in your current code, with your self.__iter_cache.

def __iter__(self)
    return self.__iter

@LazyProperty
def __iter(self)
    def my_generator():
        yield whatever
    return tuple(my_generator())
bukzor
  • 37,539
  • 11
  • 77
  • 111
  • Weird, possibly. slots is the objects own (overridden) attributes and dc_list is a list of prototype objects to (recursively) copy slots from. I'm trying to implement something like Self's delegation mechanishm in Py. – Emanuel Landeholm Jul 05 '12 at 16:57
  • `__iter__` must return an iterator, tuple isn't one. – jfs Jul 06 '12 at 12:13