28

I have a data in form a dictionary.. NOw I take the input from the user and it can be anything.. And I am trying to do the following. If the key exists then cool.. fetch the value from the dictionary. if not, then fetch the nearest (in numeric sense). For example..if the input key is 200 and the keys are like :....

197,202,208...

Then probably 202 is the closest key to 200.. Now, from algorithm point of view. its straight forward.. but is there a pythonic way to do this? Thanks

frazman
  • 32,081
  • 75
  • 184
  • 269
  • 5
    Does it need to be a `dict`, or would a "dictionary-like" object suffice? If instead you use a binary tree or sorted list, then you can use binary search to find the closest key in O(log n) time. – Daniel Pryden Oct 28 '11 at 20:35
  • 2
    "from algorithm point of view. its straight forward"... I assume this means you're okay with O(n) solutions, as O(log n) solutions are less straightforward. – Laurence Gonsalves Oct 28 '11 at 20:45

6 Answers6

40

This issue is made a lot harder by dict keys being in no particular order. If you can play with how you make the dict so they are in order (like your example) and use python >= 2.7 you can use OrderedDict and bisect to make this lightning fast.

import collections
a = collections.OrderedDict()
for i in range(100):
    a[i] = i

import bisect
ind = bisect.bisect_left(a.keys(), 45.3)

Then you only have to check element ind and ind-1 to see which is closer, thus making a lot fewer calculations.


As pointed out below by Steven G, in Python3 the .keys() is not just a list and must be changed into one.

bisect.bisect_left(list(a.keys()), 45.3)
Brian Larsen
  • 1,740
  • 16
  • 28
  • 1
    I get `TypeError: 'odict_keys' object does not support indexing` when trying your solution on python 3.6 – Steven G Jan 04 '18 at 16:13
  • 1
    this can be corrected by using `bisect.bisect_left(list(a.keys()), 45.3)` – Steven G Jan 04 '18 at 16:28
  • 1
    as of Python 3.7, `dict` preserves the insertion order (https://mail.python.org/pipermail/python-dev/2017-December/151283.html), so you don't even need to use OrderedDict for this anymore. – nurettin May 18 '22 at 12:06
33

here's your function on one line:

data.get(num, data[min(data.keys(), key=lambda k: abs(k-num))])

edit: to not evaluate the min when the key is in the dict use:

data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]

or if all values in data evaluate to True you can use:

data.get(num) or data[min(data.keys(), key=lambda k: abs(k-num))]
Will
  • 4,498
  • 1
  • 21
  • 20
  • 3
    Unfortunately, this evaluates `min(data.keys()...)` for every lookup, even if the key exists in data. Maybe break up get's logic into a ternary: `data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]` – PaulMcG Oct 28 '11 at 21:56
  • 1
    Glad to help, but `if d.has_key(k)` has been deprecated in favor of `if k in d`. – PaulMcG Oct 28 '11 at 23:44
19

Rather than using OrderedDict and bisect, consider the SortedDict type in the sortedcontainers module. It's a pure-Python and fast-as-C implementation of sorted list, sorted dict, and sorted set types with 100% testing coverage and hours of stress.

With a SortedDict you can bisect for the desired key. For example:

from itertools import islice
from sortedcontainers import SortedDict

def closest(sorted_dict, key):
    "Return closest key in `sorted_dict` to given `key`."
    assert len(sorted_dict) > 0
    keys = list(islice(sorted_dict.irange(minimum=key), 1))
    keys.extend(islice(sorted_dict.irange(maximum=key, reverse=True), 1))
    return min(keys, key=lambda k: abs(key - k))

The closest function uses SortedDict.irange to create an iterator of keys nearest the given key. The keys are bisected with log(N) runtime complexity.

>>> sd = SortedDict({-3: 'a', 0: 'b', 2: 'c'})
>>> for num in range(-5, 5):
...     key = closest(sd, num)
...     print('Given', num, ', closest:', key)
Given -5 , closest: -3
Given -4 , closest: -3
Given -3 , closest: -3
Given -2 , closest: -3
Given -1 , closest: 0
Given 0 , closest: 0
Given 1 , closest: 2
Given 2 , closest: 2
Given 3 , closest: 2
Given 4 , closest: 2

It's Pythonic to use PyPI!

GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • How does `SortedDict()` handle negative key values? – cosmictypist Feb 29 '16 at 19:46
  • I've been using `SortedDict()`, but it incorrectly sorts the keys for negative values. – cosmictypist Feb 29 '16 at 19:47
  • @christylynn002 please open an issue at https://github.com/grantjenks/sorted_containers/issues – GrantJ Feb 29 '16 at 23:41
  • Is this issue fixed? – Paebbels Apr 17 '16 at 15:06
  • 1
    @Paebbels No issue was opened. I could never reproduce myself. – GrantJ Apr 17 '16 at 23:23
  • I think I know what's wrong. The reason is not within SortedDict, but this code itself - sd.iloc[-1] returns last element of sd, when it should (within common sense) return first element or None. Should I edit? – ogurets Aug 30 '18 at 15:54
  • 1
    @ogurets Good observation! I see now how the previous code could include the largest key when `index` equalled 0. I updated the code for version 2 of sortedcontainers. Better here to use `irange` than the deprecated `iloc`. – GrantJ Aug 30 '18 at 19:37
1

If all you have is a Python dictionary, you can't do better than checking all the entries in the dictionary (as in Will's answer). However, if you want to find the closest key more efficiently than that (i.e., in O(log N) instead of O(N)), you want a balanced tree of some sort.

Unfortunately, I don't believe Python has such a datastructure in its standard library -- as the Pythonic way is to use a dict instead. So, if you expect to make a many such queries on a large map, your best choice may be to find an extension library, or even roll your own...

comingstorm
  • 25,557
  • 3
  • 43
  • 67
  • 1
    Check out `bisect` for what you describe. Create a class with a bisect for keys and a dict for key-value mapping. Use the bisect to find the proper insertion point of a new key in the list of keys, and then check the neighboring values to see which one is closer. – PaulMcG Oct 28 '11 at 21:57
0

Using sortedcontainers.SortedDict, you can do this like this:

def closest_item(sdict, key):
    if len(sdict) == 0:
        raise KeyError('No items in {sdict.__class__.__name__}')

    if len(sdict) == 1:
        return next(iter(sdict.items()))

    idx_before = next(sdict.irange(minimum=key), None)
    idx_after = next(sdict.irange(maximum=key, reverse=True), None)

    if idx_before is None:
        idx = idx_after

    elif idx_after is None:
        idx = idx_before
    else:
        idx = min(idx_before, idx_after, key=lambda x: abs(x - key))

    return idx, sdict[idx]
MaxNoe
  • 14,470
  • 3
  • 41
  • 46
0

This should do what you want (minus getting it from a key, but you can figure that out :).

f = lambda a,l:min(l,key=lambda x:abs(x-a))
numbers = (100, 200, 300, 400)
num = int(raw_input())
print 'closest match:', f(num, numbers)

Note: f is from this question.

Community
  • 1
  • 1
John Doe
  • 3,436
  • 1
  • 18
  • 21