2

I am trying to create a dict which contains only a limited number of MRU entries (for helping in caching the output of a costly C function I call via ctypes). Here is the code:

from collections import OrderedDict

class MRUDict(OrderedDict):

    def __init__(self, capacity = 64):
        super().__init__()
        self.__checkAndSetCapacity(capacity)

    def capacity(self):
        return self.__capacity

    def setCapacity(self, capacity):
        self.__checkAndSetCapacity(capacity)
        for i in range(len(self) - capacity):
            self.__evict() # will execute only if len > capacity

    def __getitem__(self, key):
        value = super().__getitem__(key)
        # if above raises IndexError, next line won't execute
        print("Moving key {} to last i.e. MRU position".format(key))
        super().move_to_end(key)
        return value

    def __setitem__(self, key, value):
        if key in self:
            super().move_to_end(key)
        else: # new key
            if len(self) == self.__capacity:
                self.__evict()
        super().__setitem__(key, value)

    def __evict(self):
        key, value = self.popitem(last = False) # pop first i.e. oldest item
        print("Capacity exceeded. Evicting ({}, {})".format(key, value))

    def __checkAndSetCapacity(self, capacity):
        if not isinstance(capacity, int):
            raise TypeError("Capacity should be an int.")
        if capacity == 0:
            raise ValueError("Capacity should not be zero.")
        self.__capacity = capacity

... and here is the testing code:

def printkeys(d):
    print("Current keys in order:", tuple(d)) # here d means d.keys()
    print()

from mrudict import MRUDict
print("Creating MRUDict with capacity 5.")
d = MRUDict(5)
print("Adding keys 0 to 7 with values:")
for i in range(8): d[i] = i + 0.1
printkeys(d)

print("Calling str on object:")
print(d) # test of default __repr__ (since probably __str__ is the same here)
printkeys(d)

print("Accessing existing key 4:")
print(4, d[4]) # test of __getitem__
printkeys(d)

try:
    print("Accessing non-existing key 20:")
    print(20, d[20]) # test of __getitem__
except:
    print("Caught exception: key does not exist.")
printkeys(d)

print("Updating value of existing key 6:")
d[6] = 6.6 # test of __setitem__ with existing key
printkeys(d)

print("Adding new key, value pair:")
d[10] = 10.1 # test of __setitem__ with non-existing key
printkeys(d)

print("Testing for presence of key 3:")
print(3 in d)
printkeys(d)

print("Trying to loop over the items:")
for k in d: print(k, d[k])
printkeys(d)

print("Trying to loop over the items:")
for k, v in d.items(): print(k, v)
printkeys(d)

Now from the output it seems I am being kind of naive in implementing the __getitem__ function because both __repr__ and for ... in (which, I'm guessing here, call __iter__ and then __getitem__) causes the first item to be moved to the last as MRU, but cannot proceed further because there is no "next" item for the iterator since it now points to the last element. But I am not sure what I can do to fix the situation. Should I reimplement __iter__?

I am not sure how to distinguish between the user's calling __getitem__ and an internal call to the same. Of course, a workaround is to make the user to use a find() method which would do the move-to-end thing, but I'd really like to be able to use the regular syntax d[k].

Please advise on how to fix this. Thanks!

jamadagni
  • 1,214
  • 2
  • 13
  • 18
  • Could you explain the problem more clearly, please; how does the output you get from your current testing differ from what you wanted/expected? – jonrsharpe May 17 '14 at 11:10
  • @jonrsharpe: `__getitem__` modifies the order.. So when iterating, the first item moves to the last and iteration apparently breaks. – Martijn Pieters May 17 '14 at 11:13
  • Internally, `OrderedDict` uses chained list. `__iter__` goes over that chained list directly, no `__getitem__` is being called. – Martijn Pieters May 17 '14 at 11:21
  • Ah, `.items()` reuses `MutableMapping.items()`, returning a `ItemsView()` object that *does* use `__getitem__` on your custom class. Replace *that* to not use `__getitem__` but some other internal method. – Martijn Pieters May 17 '14 at 11:29
  • I updated my answer now that I had some time to actually parse your test suite. For future reference: do try to reduce your sample code to something that immediately demonstrates the issue. Most of your test suite is extraneous to your problem. – Martijn Pieters May 17 '14 at 14:32

1 Answers1

2

For complex changes of behaviour like these, it pays to study the OrderedDict source code.

The actual __iter__ method loops directly over the internal structure, the doubly linked list that maintains the item order. It'll never directly use __getitem__, instead just returning the keys from the linked list.

The actual problem that you are having is that you are directly accessing the items while looping:

for k in d: print(k, d[k])

There is a d[k] in there; it is that access that moves item 5 from start to end. This updates the linked list, so when asking for the next item the curr.next reference is now the root and iteration stops.

The work-around would be to not do that. Add a dedicated method to access items without triggering the MRU update. Or you could re-use dict.get() for example:

>>> for k in d: print(k, d.get(k))
... 
5 5.1
7 7.1
4 4.1
6 6.6
10 10.1

You will have a problem with the .items() method; OrderedDict reuses collections.abc.MutableMapping's .items() method, which returns a collections.abc.ItemsView() instance; see the collections.abc source code.

You'll have to replace that behaviour:

from collections.abc import ItemsView


class MRUDictItemsView(ItemsView):
    def __contains__(self, item):
        key, value = item
        v = self._mapping.get(key, object())
        return v == value

    def __iter__(self):
        for key in self._mapping:
            yield (key, self._mapping.get(key))


class MRUDict(OrderedDict):
    # ...

    def items(self):
        return MRUDictItemsView(self)

You'd have to do the same for the .keys() and .values() methods.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Hello thank you very much for your detailed reply and sorry about the extra stuff in the test suite. Now I understand I have to reimplement the `ItemsView` to avoid using `[]` i.e. `__getitem__` and to use `.get()` instead, and from the `collections.abc` link you give it is clear I have to reimplement `ValuesView` too, but given that `KeysView` doesn't seem to use `[]` I don't need to reimplement that, right? – jamadagni May 18 '14 at 02:36
  • OK what I did on top of what Martijn gave as code above was to just to add a `, ValuesView` to the `from collections.abc import` line and to add a `MRUDictValuesView` at module level replacing the `[]` in the `ValuesView` methods by `.get()`, and to add a `.values()` to `MRUDict` and it works now. Thank you very much Martijn! – jamadagni May 18 '14 at 03:01
  • Right, `KeysView` indeed won't trigger `__getitem__`. – Martijn Pieters May 18 '14 at 07:18