20

I have some task to solve and the most important part at the moment is to make the script as time-efficient as possible. One of the elements I am trying to optimize is memoization within one of the functions.

So my question is: Which of the following 3-4 methods is the most efficient / fastest method of implementing memoization in Python?

I have provided code only as an example - if one of the methods is more efficient, but not in the case I mentioned, please share what you know.

Solution 1 - using mutable variable from outer scope

This solution is often shown as the example memoization, but I am not sure how efficient it is. I have heard that using global variables (in this case it is variable from outer, not global scope) is less efficient.

def main():
    memo = {}
    def power_div(n):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Solution 2 - using default, mutable argument

I have found somewhere that using default mutable arguments has been used in the past to pass variables from outer scope, when Python searched the variable first in the local scope, then in the global scope, skipping the nonlocal scope (in this case the scope within function main()). Because default argument is initialized only at the time function is defined and is accessible only inside the inner function, maybe it is thus more efficient?

def main():
    def power_div(n, memo={}):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Or maybe the following version (being in fact a combination of solutions 1&2) is more efficient?

def main():
    memo = {}
    def power_div(n, memo=memo):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Solution 3 - function's attribute

This is another quite common example of memoization in Python - the memoization object is stored as an attribute of the function itself.

def main():
    def power_div(n):
        memo = power_div.memo
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Summary

I am very interested in your opinions about the four above solutions for memoization. It is important also, that the function that uses memoization is within another function.

I know that there are also other solutions for memoization (such as Memoize decorator), but it is hard for me to believe that this is more efficient solution than these listed above. Correct me if I am wrong.

Thanks in advance.

Tadeck
  • 132,510
  • 28
  • 152
  • 198
  • As with most "which of these is faster" questions, the ultimate answer is "try it and find out". The `timeit` module provides a very good way to test things like this. – Amber Feb 02 '12 at 06:59
  • (Also: have you profiled your existing code and found the memoization to be a bottleneck? If no, why are you focusing on optimizing it?) – Amber Feb 02 '12 at 07:00
  • 1
    @Amber: The case is 1) I have not much to optimize in my existing code, so I am trying to improve everything I can, 2) this question is more about the efficiency of the mentioned cases and why one is better than another, it is more general. I am not trying to use `timeit`, because 1) I may be missing some other, more efficient solution. 2) My results may be biased because of the way I use memoization. I am trying to find the fastest way to use memoization to learn it and to let people know, not necessarily fix this one piece of code (such question would be too localized). – Tadeck Feb 02 '12 at 07:10
  • My immediate assumption would be that using the `get()` method of `dict` objects would be faster than catching `KeyError`. But it may be that the speed up would only affect the "cache miss" branch, in which case it's not worth it. But it's probably worth timing both ways. – Daniel Pryden Feb 02 '12 at 07:23
  • @DanielPryden: I have been thinking about using `get()`, but since you need to calculate something if the key has not been found, it would look like that: `memo.get(n, (n ** 2) % 4)`. In this case it would not make much sense, because `(n ** 2) % 4` would be executed every time function is called (thus memoization would be useless). – Tadeck Feb 02 '12 at 07:27
  • @Tadeck: No, just use a sentinel value (`None` unless there's a reason to use something else) and an `if` statement: `temp = memo.get(n, None); if temp is not None: return temp;` etc. As I said above, though, I'm not *sure* this will actually be faster, but it's worth testing as an alternative. – Daniel Pryden Feb 02 '12 at 07:35
  • Please **update** the question with two things. 1) Why you omitted solution 4, a callable object. 2) The `timeit` benchmark numbers from each solution. – S.Lott Feb 02 '12 at 10:51
  • @S.Lott: Ad. 2) As I said above, I believe my `timeit` tests may be biased and may not be giving an answer to the real question here (the efficiency of different ways of memoization). I will do the tests and update the question, if you insist, though. Ad. 1) I didn't even think of callable object as a possibly more efficient solution to this problem. Would you like to add more details about that? If yes, please give me something more on how such example code using this solution would look like, or post it as an answer. – Tadeck Feb 02 '12 at 11:05
  • "may be biased"? How is that even possible? It's **your** application. Nothing could be **more** relevant that you producing timing of your actual application. There's never a *general* answer to optimization -- it's *always* highly specific to the actual problem, actual application, actual configuration. What part of `collections.Callable` do you need help with? Perhaps you should start with Search and then ask a separate question, rather than clutter this question. – S.Lott Feb 02 '12 at 12:55
  • @S.Lott: The question was exactly about what Raymond answered - the efficiency of different styles of variable access. It is not that I would like someone to optimize my script, I was asking about some specific part of Python and its efficiency in different cases. As Raymond proved, there is a general answer to this general question. When it comes to your question about **Callable**: I had (have) problems figuring out what exactly _you_ had in mind. Callable object could be used here like that: http://ideone.com/EKgfY - but is it something _you_ had in mind? I do not know. Thanks. – Tadeck Feb 02 '12 at 18:08
  • "there is a general answer to this general question". Since there are an infinite number of variations on the memoized function and nature of the values which are cached, I find the "general answer" to be essentially impossible. However. You clearly are satisfied. What I had in mind was asking two questions. Why you excluded a callable object. Asking me what I had in mind doesn't make any sense. Why did you exclude a callable object? And. I asked why you failed to include `timeit` results and you answered that. Claiming there is a general answer is silly, but that's your reason. – S.Lott Feb 03 '12 at 01:30
  • Going back to the bit about `memo.get(n, (n ** 2) % 4)` defeating the memoization because the default value is executed every time, I'd like to point out that the expression `memo.get(n) or ((n ** 2) % 4)` avoids that quite neatly with `or` short-circuiting (ie, the expression on the right side of `or` is only evaluated if the left side is False. – robru Jun 24 '12 at 08:14

2 Answers2

16

The different styles of variable access have already been timed and compared at: http://code.activestate.com/recipes/577834-compare-speeds-of-different-kinds-of-access-to-var Here's a quick summary: local access beats nonlocal (nested scopes) which beat global access (module scope) which beats access to builtins.

Your solution #2 (with local access) should win. Solution #3 has a slow-dotted lookup (which requires a dictionary lookup). Solution #1 uses nonlocal (nested scope) access which uses cell-variables (faster than a dict lookup but slower than locals).

Also note, the KeyError exception class is a global lookup and can be sped-up by localizing it. You could replace the try/except entirely and use a memo.get(n, sentinel) instead. And even that could be sped-up by using a bound method. Of course, your easiest speed boost may just come from trying out pypy :-)

In short, there are many ways to tweak this code. Just make sure it's worth it.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • Thank you very much :) Do you think there is a difference in performance between using `memo=memo` (where `memo` is in the nonlocal scope) and `memo={}` (so there is no nonlocal scope involved)? – Tadeck Feb 02 '12 at 07:57
  • 1
    @Tadeck There should be no difference at all. Both ways end-up with a local variable pointing directly at the dict instance. – Raymond Hettinger Feb 02 '12 at 08:10
3

For the benefit of people who stumble on this question while looking for a way to do memoization in python, I recommend fastcache.

It works on python 2 and 3, is faster than any of the methods described above, and gives the option to limit cache size so that it does not inadvertently get too big:

from fastcache import clru_cache

@clru_cache(maxsize=128, typed=False)
def foo(cat_1, cat_2, cat_3):
    return cat_1 + cat_2 + cat_3

Installing fastcache is simple, using pip:

pip install fastcache

or conda:

conda install fastcache
ostrokach
  • 17,993
  • 11
  • 78
  • 90
  • 3
    On python 3 you can use native functools.lru_cache. From my experiments, it works even a bit faster than fastcache version. – alyaxey Oct 13 '17 at 11:07