0

I have a list within a dictionary within a dictionary. The data set is very large. How can I most quickly return the list nested in the two dictionaries if I am given a List that is specific to the key, dict pairs?

{"Dict1":{"Dict2": ['UNIOUE LIST'] }} 

Is there an alternate data structure to use for this for efficiency?

Alex
  • 486
  • 1
  • 7
  • 19

3 Answers3

1

I do not believe a more efficient data structure exists in Python. Simply retrieving the list using the regular indexing operator should be a very fast operation, even if both levels of dictionaries are very large.

nestedDict = {"Dict1":{"Dict2": ['UNIOUE LIST'] }} 
uniqueList = nestedDict["Dict1"]["Dict2"]

My only thought for improving performance was to try flattening the data structure into a single dictionary with tuples for keys. This would take more memory than the nested approach since the keys in the top-level dictionary will be replicated for every entry in the second-level dictionaries, but it will only compute the hash function once for every lookup. But this approach is actually slower than the nested approach in practice:

nestedDict = {i: {j: ['UNIQUE LIST'] for j in range(1000)} for i in range(1000)}
flatDict = {(i, j): ['UNIQUE LIST'] for i in range(1000) for j in range(1000)}

import random

def accessNested():
    i = random.randrange(1000)
    j = random.randrange(1000)
    return nestedDict[i][j]

def accessFlat():
    i = random.randrange(1000)
    j = random.randrange(1000)
    return nestedDict[(i,j)]

import timeit

print(timeit.timeit(accessNested))
print(timeit.timeit(accessFlat))

Output:

2.0440238649971434
2.302736301004188
ApproachingDarknessFish
  • 14,133
  • 7
  • 40
  • 79
0

The fastest way to access the list within the nested dictionary is,

d = {"Dict1":{"Dict2": ['UNIOUE LIST'] }} 

print(d["Dict1"]["Dict2"])

Output :

['UNIOUE LIST'] 

But if you perform iteration on the list that is in nested dictionary. so you can use the following code as example,

d = {"a":{"b": ['1','2','3','4'] }} 

for i in d["a"]["b"]:
    print(i)

Output :

1
2
3
4
Usman
  • 1,983
  • 15
  • 28
0

If I understand correctly, you want to access a nested dictionary structure if...

if I am given a List that is specific to the key

So, here you have a sample dictionary and key that you want to access

d = {'a': {'a': 0, 'b': 1}, 
     'b': {'a': {'a': 2}, 'b': 3}}
key = ('b', 'a', 'a')

The lazy approach

This is fast if you know Python dictionaries already, no need to learn other stuff!

>>> value = d
>>> for level in key:
...     value = temp[level]
>>> value
2

NestedDict from the ndicts package

If you pip install ndicts then you get the same "lazy approach" implementation in a nicer interface.

>>> from ndicts import NestedDict
>>> nd = NestedDict(d)
>>> nd[key]
2
>>> nd["b", "a", "a"]
2

This option is fast because you can't really write less code than nd[key] to get what you want.

Pandas dataframes

This is the solution that will give you performance. Lookups in dataframes should be quick, especially if you have a sorted index.

In this case we have hierarchical data with multiple levels, so I will create a MultiIndex first. I will use the NestedDict for ease, but anything else to flatten the dictionary will do.

>>> keys = list(nd.keys())
>>> values = list(nd.values())
>>> from pandas import DataFrame, MultiIndex
>>> index = MultiIndex.from_tuples(keys)
>>> df = DataFrame(values, index=index, columns="Data").sort_index()
>>> df
         Data
a a NaN     0
  b NaN     1
b a a       2
  b NaN     3

Use the loc method to get a row.

>>> nd.loc[key]
Data    2
Name: (b, a, a), dtype: int64
edd313
  • 1,109
  • 7
  • 20