19

Search for a value and get the parent dictionary names (keys):

Dictionary = {dict1:{
        'part1': {
            '.wbxml': 'application/vnd.wap.wbxml',
            '.rl': 'application/resource-lists+xml',    
        },
        'part2':
            {'.wsdl': 'application/wsdl+xml',
            '.rs': 'application/rls-services+xml',
            '.xop': 'application/xop+xml',
            '.svg': 'image/svg+xml',
            },
        'part3':{...}, ...

   dict2:{
          'part1': {    '.dotx': 'application/vnd.openxmlformats-..'                           
            '.zaz': 'application/vnd.zzazz.deck+xml',
            '.xer': 'application/patch-ops-error+xml',}  
          },
          'part2':{...},
          'part3':{...},...  

    },...

In above dictionary I need to search values like: "image/svg+xml". Where, none of the values are repeated in the dictionary. How to search the "image/svg+xml"? so that it should return the parent keys in a dictionary { dict1:"part2" }.

Please note: Solutions should work unmodified for both Python 2.7 and Python 3.3.

Laxmikant Ratnaparkhi
  • 4,745
  • 5
  • 33
  • 49
  • Please don't use contradictory tags... chose one or the other. – A.J. Uppal Mar 04 '14 at 05:17
  • @aj8uppal - This is because, I wan't the answer in both of the python versions. If it works in python 2.7 it must work in python 3.3 and vice versa. Because we do have two servers. one is using python2.7 and other one is the latest. And we are writing a script, that should run on both the server. Sorry about that! – Laxmikant Ratnaparkhi Mar 04 '14 at 05:26
  • Something like this should be in the standard lib. – Jonathan Allen Grant Jun 13 '19 at 12:47

5 Answers5

32

Here's a simple recursive version:

def getpath(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            return path
        elif hasattr(v, 'items'): # v is a dict
            p = getpath(v, value, path) # recursive call
            if p is not None:
                return p

Example:

print(getpath(dictionary, 'image/svg+xml'))
# -> ('dict1', 'part2', '.svg')

To yield multiple paths (Python 3 only solution):

def find_paths(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            yield path
        elif hasattr(v, 'items'): # v is a dict
            yield from find_paths(v, value, path) 

print(*find_paths(dictionary, 'image/svg+xml'))
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • This is a great, clean solution, but I am wondering how to go about modifying it for cases/problems where the same value exists in multiple locations in the dictionary. Instead of just returning, we would append to a solution path list, right? – etnie1031 Jul 21 '22 at 14:35
  • @etnie1031 you could try to replace `return` with `yield` and handle the iterable return value (e.g., use a for-loop, to enumerate values yielded by `find_paths` (`getpath` would be misleading name for multiple values). – jfs Jul 21 '22 at 16:57
10

This is an iterative traversal of your nested dicts that additionally keeps track of all the keys leading up to a particular point. Therefore as soon as you find the correct value inside your dicts, you also already have the keys needed to get to that value.

The code below will run as-is if you put it in a .py file. The find_mime_type(...) function returns the sequence of keys that will get you from the original dictionary to the value you want. The demo() function shows how to use it.

d = {'dict1':
         {'part1':
              {'.wbxml': 'application/vnd.wap.wbxml',
               '.rl': 'application/resource-lists+xml'},
          'part2':
              {'.wsdl': 'application/wsdl+xml',
               '.rs': 'application/rls-services+xml',
               '.xop': 'application/xop+xml',
               '.svg': 'image/svg+xml'}},
     'dict2':
         {'part1':
              {'.dotx': 'application/vnd.openxmlformats-..',
               '.zaz': 'application/vnd.zzazz.deck+xml',
               '.xer': 'application/patch-ops-error+xml'}}}


def demo():
    mime_type = 'image/svg+xml'
    try:
        key_chain = find_mime_type(d, mime_type)
    except KeyError:
        print ('Could not find this mime type: {0}'.format(mime_type))
        exit()
    print ('Found {0} mime type here: {1}'.format(mime_type, key_chain))
    nested = d
    for key in key_chain:
        nested = nested[key]
    print ('Confirmation lookup: {0}'.format(nested))


def find_mime_type(d, mime_type):
    reverse_linked_q = list()
    reverse_linked_q.append((list(), d))
    while reverse_linked_q:
        this_key_chain, this_v = reverse_linked_q.pop()
        # finish search if found the mime type
        if this_v == mime_type:
            return this_key_chain
        # not found. keep searching
        # queue dicts for checking / ignore anything that's not a dict
        try:
            items = this_v.items()
        except AttributeError:
            continue  # this was not a nested dict. ignore it
        for k, v in items:
            reverse_linked_q.append((this_key_chain + [k], v))
    # if we haven't returned by this point, we've exhausted all the contents
    raise KeyError


if __name__ == '__main__':
    demo()

Output:

Found image/svg+xml mime type here: ['dict1', 'part2', '.svg']

Confirmation lookup: image/svg+xml

Community
  • 1
  • 1
KobeJohn
  • 7,390
  • 6
  • 41
  • 62
  • @kobehjohn:I'm waiting for your further explaination – Laxmikant Ratnaparkhi Mar 04 '14 at 10:23
  • @LaxmikantGurnalkar did you try it yet? You can put all this code in one .py file and run it. It should give you the same output that I listed. If this is what you are looking for, then I will put some more documentation. If not, please let me know what is different. – KobeJohn Mar 04 '14 at 12:11
  • @kobehjohn: I was quite busy to work on that task. Later, Will work on that after an hour. Will accept the answer once tested. Thanks – Laxmikant Ratnaparkhi Mar 04 '14 at 12:34
  • @LaxmikantGurnalkar Sounds good. Hope it works for you. I updated the answer code to show how to look up the value after getting the chain of keys. – KobeJohn Mar 04 '14 at 12:43
  • 1
    +1. Here's [the same algorithm implemented using recursive calls](http://stackoverflow.com/a/22171182/4279) – jfs Mar 04 '14 at 13:58
  • btw, please, use `[]` list literal instead of `list()`. An empty tuple `()` would work as well (as in my code) – jfs Mar 04 '14 at 14:16
  • @J.F.Sebastian I tend to use language when possible to make things more explicit and avoid confusing new/multilingual programmers. Is there a guideline that says use the literal for an empty list? I've been searching but can't find it either way. – KobeJohn Mar 04 '14 at 14:21
  • @kobejohn: I've never tried to find such guideline. A person who doesn't know what `[]` means in Python is unlikely to know what `list()` does either. Moreover, `list` might create wrong associations for people from other languages e.g., `list()` is not a linked list. – jfs Mar 04 '14 at 14:31
  • @J.F.Sebastian This is good food for thought. I have to imagine there are also languages that use `[]` for something other than a ~dynamically sized array. `{}` certainly has a large range of uses. Also, `list(), dict(), tuple()` are more visually differentiated than `[], {}, ()` so I see less chance for misreading/error. In any case, thanks for the feedback. At least I will consider which one is more readable in each case from now on. – KobeJohn Mar 04 '14 at 14:56
  • @kobejohn: I don't buy the "easy to overlook" argument. Don't start to use `str()` instead of `''` because the later is easier to overlook. – jfs Mar 04 '14 at 15:01
3

Here is a solution that works for a complex data structure of nested lists and dicts

import pprint

def search(d, search_pattern, prev_datapoint_path=''):
    output = []
    current_datapoint = d
    current_datapoint_path = prev_datapoint_path
    if type(current_datapoint) is dict:
        for dkey in current_datapoint:
            if search_pattern in str(dkey):
                c = current_datapoint_path
                c+="['"+dkey+"']"
                output.append(c)
            c = current_datapoint_path
            c+="['"+dkey+"']"
            for i in search(current_datapoint[dkey], search_pattern, c):
                output.append(i)
    elif type(current_datapoint) is list:
        for i in range(0, len(current_datapoint)):
            if search_pattern in str(i):
                c = current_datapoint_path
                c += "[" + str(i) + "]"
                output.append(i)
            c = current_datapoint_path
            c+="["+ str(i) +"]"
            for i in search(current_datapoint[i], search_pattern, c):
                output.append(i)
    elif search_pattern in str(current_datapoint):
        c = current_datapoint_path
        output.append(c)
    output = filter(None, output)
    return list(output)


if __name__ == "__main__":
    d = {'dict1':
             {'part1':
                  {'.wbxml': 'application/vnd.wap.wbxml',
                   '.rl': 'application/resource-lists+xml'},
              'part2':
                  {'.wsdl': 'application/wsdl+xml',
                   '.rs': 'application/rls-services+xml',
                   '.xop': 'application/xop+xml',
                   '.svg': 'image/svg+xml'}},
         'dict2':
             {'part1':
                  {'.dotx': 'application/vnd.openxmlformats-..',
                   '.zaz': 'application/vnd.zzazz.deck+xml',
                   '.xer': 'application/patch-ops-error+xml'}}}

    d2 = {
        "items":
            {
                "item":
                    [
                        {
                            "id": "0001",
                            "type": "donut",
                            "name": "Cake",
                            "ppu": 0.55,
                            "batters":
                                {
                                    "batter":
                                        [
                                            {"id": "1001", "type": "Regular"},
                                            {"id": "1002", "type": "Chocolate"},
                                            {"id": "1003", "type": "Blueberry"},
                                            {"id": "1004", "type": "Devil's Food"}
                                        ]
                                },
                            "topping":
                                [
                                    {"id": "5001", "type": "None"},
                                    {"id": "5002", "type": "Glazed"},
                                    {"id": "5005", "type": "Sugar"},
                                    {"id": "5007", "type": "Powdered Sugar"},
                                    {"id": "5006", "type": "Chocolate with Sprinkles"},
                                    {"id": "5003", "type": "Chocolate"},
                                    {"id": "5004", "type": "Maple"}
                                ]
                        },

                        ...

                    ]
            }
    }

pprint.pprint(search(d,'svg+xml','d'))
>> ["d['dict1']['part2']['.svg']"]

pprint.pprint(search(d2,'500','d2'))
>> ["d2['items']['item'][0]['topping'][0]['id']",
 "d2['items']['item'][0]['topping'][1]['id']",
 "d2['items']['item'][0]['topping'][2]['id']",
 "d2['items']['item'][0]['topping'][3]['id']",
 "d2['items']['item'][0]['topping'][4]['id']",
 "d2['items']['item'][0]['topping'][5]['id']",
 "d2['items']['item'][0]['topping'][6]['id']"]
Vinay Kumar
  • 33
  • 1
  • 1
  • 5
  • 1
    Hello, and welcome to SO! Please add more detail and explaination of your answer. – Devang Padhiyar Apr 06 '19 at 14:15
  • I am not a regular programmer. I apologize for the novice coding style. The solution is intended to work for any complex dictionary with a nested dictionary of dicts and lists. The logic is very simple. I start from the top hierarchy and recursively go through the dicts and lists down the hierarchy. At each point, I check if any key or its value (for dicts) or any index or its value (for lists) match the search pattern. If there is a match the path till that point is pushed into the output list. – Vinay Kumar Apr 06 '19 at 14:17
1

Here are two similar quick and dirty ways of doing this type of operation. The function find_parent_dict1 uses list comprehension but if you are uncomfortable with that then find_parent_dict2 uses the infamous nested for loops.

Dictionary = {'dict1':{'part1':{'.wbxml':'1','.rl':'2'},'part2':{'.wbdl':'3','.rs':'4'}},'dict2':{'part3':{'.wbxml':'5','.rl':'6'},'part4':{'.wbdl':'1','.rs':'10'}}}

value = '3'

def find_parent_dict1(Dictionary):
    for key1 in Dictionary.keys():
        item = {key1:key2 for key2 in Dictionary[key1].keys() if value in Dictionary[key1][key2].values()}
        if len(item)>0:
            return item

find_parent_dict1(Dictionary)


def find_parent_dict2(Dictionary):
    for key1 in Dictionary.keys():
        for key2 in Dictionary[key1].keys():
            if value in Dictionary[key1][key2].values():
                print {key1:key2}

find_parent_dict2(Dictionary)
BushMinusZero
  • 1,202
  • 16
  • 21
  • 2
    It doesn't work for arbitrary nested dictionaries. `.keys()` call is redundant in `for`-loops. – jfs Mar 04 '14 at 14:01
1

Traverses a nested dict looking for a particular value. When success is achieved the full key path to the value is printed. I left all the comments and print statements for pedagogical purposes (this isn't production code!)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 24 17:16:46 2022

@author: wellington
"""


class Tree(dict):
    """
    allows autovivification as in Perl hashes
    """

    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

# tracking the key sequence when seeking the target
key_list = Tree()

# dict storing the target success result
success = Tree()


# example nested dict of dicts and lists
E = {
    'AA':
        {
          'BB':
               {'CC':
                     {
                      'DD':
                          {
                           'ZZ':'YY',
                           'WW':'PP'
                           },
                       'QQ':
                           {
                            'RR':'SS'
                            },
                     },
                'II': 
                     {
                      'JJ':'KK'
                     }, 
                'LL':['MM', 'GG', 'TT']
               }
        }
    }


def find_keys_from_value(data, target):
    """
    recursive function -
    given a value it returns all the keys in the path to that value within
    the dict "data"
    there are many paths and many false routes
    at the end of a given path if success has not been achieved
    the function discards keys to get back to the next possible path junction
    """

    print(f"the number of keys in the local dict is {len(data)}")
    key_counter = 0

    for key in data:
        key_counter += 1

        # if target has been located stop iterating through keys
        if success[target] == 1:
            break
        else:
            # eliminate prior key from path that did not lead to success
            if key_counter > 1:
                k_list.pop()
            # add key to new path
            k_list.append(key)
            print(f"printing k_list after append{k_list}")

        # if target located set success[target] = 1 and exit
        if key == target or data[key] == target:
            key_list[target] = k_list
            success[target] = 1
            break
        # if the target has not been located check to see if the value
        # associated with the new key is a dict and if so return to the
        # recursive function with the new dict as "data"
        elif isinstance(data[key], dict):
            print(f"\nvalue is dict\n {data[key]}")
            find_keys_from_value(data[key], target)

        # check to see if the value associated with the new key is a list
        elif isinstance(data[key], list):
            # print("\nv is list\n")
            # search through the list
            for i in data[key]:

                # check to see if the list element is a dict
                # and if so return to the recursive function with
                # the new dict as "data
                if isinstance(i, dict):
                    find_keys_from_value(i, target)

                # check to see if each list element is the target
                elif i == target:
                    print(f"list entry {i} is target")
                    success[target] = 1
                    key_list[target] = k_list
                elif i != target:
                    print(f"list entry {i} is not target")
                    print(f"printing k_list before pop_b {k_list}")
                    print(f"popping off key_b {key}")

        # so if value is not a key and not a list and not the target then
        # discard the key from the key list
        elif data[key] != target:
            print(f"value {data[key]} is not target")
            print(f"printing k_list before removing key_before {k_list}")
            print(f"removing key_c {key}")
            k_list.remove(key)


# select target values
values = ["PP", "SS", "KK", "TT"]
success = {}

for target in values:
    print(f"\nlooking for target {target}")
    success[target] = 0
    k_list = []
    find_keys_from_value(E, target)
    print(f"\nprinting key_list for target {target}")
    print(f"{key_list[target]}\n")
    print("\n****************\n\n")