Lazy evaluation of map

Question

I recently read that one benefit of map in Python 3 was that it is lazy. That means, it is better to do

map(lambda x: x**2, range(10**100))

rather than

[x**2 for x in range(10**100)]

What I'm curious about, is how I can put this laziness to use. If I generate the map object, how can I, for example, access a particular element in the resulting operation/list. In almost every documentation on map I've seen, they will do something like print(map(...)) or for i in map(...) which (as far as I understand it) relinquishes the lazy concept as it implicitly converts the map to a list.

I suppose what I'm looking for is the ability to use map objects in a similarly lazy fashion as range where I can do x = range(10**100) and lazily generate x[10000] without huge computational loads.

If this concept doesn't exist, what is the benefit then of having map be lazy? If you always need to convert it to some non-lazy object like a list, why does it matter that map is lazy?

If you need the 10,000th element, you probably have to calculate the preceding 99,999 (unless you have a simple function like `x ** 2`, in which case just do `10000 ** 2`), you can't *"index into"* it. The laziness is an advantage because you don't need to hold the whole list in memory at once. — jonrsharpe, May 24 '16 at 15:02
Then my point becomes, how is a lazy map useful (and not redundant)? Just use a generator if you're concerned about memory. — zephyr, May 24 '16 at 15:05
Iterating over a `map` in your example *isn't* any better than iterating over the equivalent generator expression. **Both are lazily evaluated**. The advantage **over a list comprehension** is that you don't store the whole list (which may be unnecessary or even impossible), which is only an advantage if you don't need it all at once. It's not clear what you're trying to ask: `map` vs a list? `map` vs generator expression? If you don't need to `map`, just *don't bother*. If you prefer the syntax with the generator expression, *use that*. — jonrsharpe, May 24 '16 at 15:07
The problem is that you could have something like `x = 0; def f(y): global x; x+= 1; return x+y` and then `map(f, iterable)`. Now the value of the 10000th element **requires** evaluation of all previous calls to `f` (due to changing shared state between calls), so if you really wanted a lazy `map` where you can access the nth element without performing previous calls you are already restricting it to only have pure functions as arguments. — Bakuriu, May 24 '16 at 15:23
@zephyr The lazyness is useful because if you had `for x in map(f, iterable): print(x)` if `map` is lazy the memory required is only the space for one element in `iterable`, the result (the `x`). If `map` wasn't lazy it'd have to first build a complete list of values (which could be huge). In fact consider: `from itertools import count; for x in map(lambda x:x+1, count()): print(x)`. With a lazy map this prints the numbers `1\n2\n3...` ad infinitum, if `map` is not lazy that code is simply going to kill your computer since you cannot built an infinite list in memory. — Bakuriu, May 24 '16 at 15:24
@Bakuriu That's a good point. I suppose I was hoping I could lazily access elements from map if I has some lazy iterable like `range` but I guess it was just wishful thinking. — zephyr, May 24 '16 at 15:35

score 12 · Accepted Answer · answered May 24 '16 at 15:37

You are comparing apples to oranges here. range is not just a lazy iterable. It is a specific object whose contents satisfy specific laws that allow to support many operations without actually building a huge sequence in memory. That's because the nth element of range is basically just start + n*step (modulo stop, signs etc.)

However map is meant to work with any function f. In particular functions may have shared/global state which already defeats any chance of being able to do map(f, something)[100] without performing 100 function calls. Not doing so breaks the correctness of the result.

map is lazy simply means it doesn't immediately build a complete list of results but waits for you to require the next result before doing the call to f and produce it. This avoid building unneccessary lists in code like:

for x in map(f, iterable):
    # do something with x

where if map was eager it would consume twice the memory of iterable to do the loop, with a lazy map the only space required is that of x basically.

Moreover it allows to call map on infinite iterables like count(). This obviously result in a never ending program doing something, or at some point you can just stop looking into map. An eager map cannot handle this case.

If you want to use your own restricted map that works only on pure fuctions and that allow random access you could write your own class:

class PureMap:
    def __init__(self, function, sequence):
        self._f = function
        self._sequence = sequence

    def __iter__(self):
        return map(self._f, self._sequence)
    def __getitem__(self, i):
        return self._f(self._sequence[i])
    # etc.

However even in this case you have some problems:

If sequence is actually an iterable to obtain the nth element you have to consume the first n elements. After that you'd have to store them as a sequence in your class for future use. But this already defeats the purpose of the whole thing since doing PureMap(f, sequence)[1000] requires you to store 1000 elements in memory anyway, even though it avoids 999 calls to f.
You want to avoid calling f multiple times on the same element. This means you'd also have to keep track of which element was already computed and which not.

The only situation where you could achieve what you want is the following:

The function being called is pure
The iterable argument is something like range that allows random access without having to produce other elements
The function you call is fast so that you can recompute it on the various elements without worrying too much about performance.

When all those assumptions are met you can have a map object that "works like range".

Didn't you overlook the case where `sequence` is some kind of aggregate? That's the case that led me to this question. Where I want to present an array to another interface and that array is derived from another array, but I know the callee probably won't use the whole thing so computing a whole new array would be folly. — sh1, Jul 31 '23 at 22:11

score 2 · Answer 2 · answered May 24 '16 at 15:17

There are many benefits; for example, it makes it easier to write memory efficient code.

def take_up_a_lot_of_memory(*args):
    """
    A contrived example of a function that uses a lot of memory
    """
    return sum([i ** 2 for i in range(10 ** 6)])

megasum = sum(map(take_up_a_lot_of_memory, range(1000)))

Also, sometimes you can terminate a calculation early without iterating through all of the map results, and so avoid redundancy.

score 1 · Answer 3 · answered May 24 '16 at 15:09

First, please note that range (xrange in Python 2) is a special case. It's not a simple generator, nor does it just return a list. It supports in operations as well, which is not a standard feature of iterables or iterators.

Consider that map(func, iterable) could be called on an infinite iterable, or an iterable where the process of fetching the next value is a time consuming process.

You'd need to be aware that your function might deal with these types of values, and make sure to use a lazy function, like itertools.imap otherwise. Since its basically impossible to determine that an iterator is infinite, even at runtime, shouldn't the builtin function behave correctly for the widest range of inputs?

Not every use-case requires random access, and those that do must fully instantiate the iterable or use another itertools function like islice.

Lazy evaluation of map

3 Answers3

Linked

Related