2

Suppose I want to create a dict (or dict-like object) that returns a default value if I attempt to access a key that's not in the dict.

I can do this either by using a defaultdict:

from collections import defaultdict

foo = defaultdict(lambda: "bar")
print(foo["hello"]) # "bar"

or by using a regular dict and always using dict.get(key, default) to retrieve values:

foo = dict()
print(foo.get("hello", "bar")) # "bar"
print(foo["hello"]) # KeyError (as expected)

Other than the obvious ergonomic overhead of having to remember to use .get() with a default value instead of the expected bracket syntax, what's the difference between these 2 approaches?

jidicula
  • 3,454
  • 1
  • 17
  • 38
  • 3
    `dict.get(key, default)` is not doing the same as `defaultdict[key]` because `defaultdict['key']` is setting `key` to the return value of the callable function. `dict.get` is not modifying the dict – C.Nivs Feb 19 '21 at 14:25
  • Ah, so the callable is only called once for a missing key. Thanks for this detail! – jidicula Feb 19 '21 at 14:31
  • 3
    Here `defaultdict[key]` is behaving more as `dict.setdefault(key, value)` – Sayandip Dutta Feb 19 '21 at 14:33

2 Answers2

3

Asides from the ergonomics of having .get everwhere, one important difference is if you lookup a missing key in defaultdict it will insert a new element into itself rather than just returning the default. The most important implications of this are:

  • Later iterations will retrieve all keys looked up in a defaultdict
  • As more ends up stored in the dictionary, more memory is typically used
  • Mutation of the default will store that mutation in a defaultdict, with .get the default is lost unless stored explicty
from collections import defaultdict 
 
default_foo = defaultdict(list) 
dict_foo = dict()                                                                                                                                                                                                                                                                                           

for i in range(1024): 
    default_foo[i] 
    dict_foo.get(i, []) 
                                                                                                                                                                                                                                                                                                 
print(len(default_foo.items())) # 1024
print(len(dict_foo.items())) # 0

# Defaults in defaultdict's can be mutated where as with .get mutations are lost
default_foo[1025].append("123")
dict_foo.get(1025, []).append("123")

print(default_foo[1025]) # ["123"]
print(dict_foo.get(1025, [])) # []
xulaus
  • 73
  • 3
  • `defaultdict` inserting a new element itself is a big difference imo, because it means that the time cost of calling the `default_factory` callable only has to be paid once for a missing key, while if that callable was the default arg to `.get()`, that call would happen on every single get for the missing key. – jidicula Feb 19 '21 at 17:28
0

The difference here really comes down to how you want your program to handle a KeyError.

foo = dict()

def do_stuff_with_foo():
    print(foo["hello"])
    # Do something here
   
if __name__ == "__main__":
    try:
        foo["hello"] # The key exists and has a value
    except KeyError:
        # The first code snippet does this
        foo["hello"] = "bar"
        do_stuff_with_foo()

        # The second code snippet does this
        exit(-1)

It's a matter of do we want to stop the program entirely? Do we want the user to fill in a value for foo["hello"] or do we want to use a default value?

The first approach is a more compact way to do foo.get("hello", "bar") But the kicker is the matter of is this what we really want to happen?

broderick
  • 77
  • 6