12

I find myself avoiding dictionaries because, often, nearly half their code is duplicate. I typically do this in nested dictionaries, where all sub-dictionaries contain the same keys, but different values.

I manually create a large parent dictionary, where each key contains a nested dictionary, to be used in external modules. The nested dictionaries all use the same keys to define configuration parameters. This usage is explicit and works but it feels foolish to retype or copy/paste the keys for each nested dictionary I create manually. I am not overly concerned about optimizing memory or performance, just wondering if I should be doing this another, more Pythonic way.

As a trivial example and pattern often seen:

people_dict = {
    "Charles Lindberg": {"address": "123 St.", 
                         "famous": True}, 
    "Me": {"address": "456 St.",
           "famous": False}
    }

>>>people_dict["Charles Lindberg"]["address"]
"123 St."

While the dictionary enables explicit code, it is tedious and error prone to define nested dictionaries with duplicate keys. In this example, half the nested dictionary is code duplicate code common to all the nested dictionaries. I have tried using tuples to eliminate duplicate keys but find this leads to fragile code - any change in position (rather than a dictionary key) fails. This also leads to code that is not explicit and hard to follow.

people_dict = {
        "Charles Lindberg": ("123 St.", True), 
        "Me": ("456 St.", False),
        }    

>>>people_dict["Charles Lindberg"][0]
"123 St."

Instead, I write a class to encapsulate the same information: This successfully reduces duplicate code...

class Person(object):
    def __init__(self, address, famous=False):
        self.address = address
        self.famous = famous

people_dict = [
    "Charles Lindberg": Person("123 St.", famous=False), 
    "Me": Person("456 St."), 
    ]

>>>people_dict["Charles Lindberg"].address
"123 St." 

Creating a class seems a little overkill... The standard data types seem too basic...

I imagine there's better way to do this in Python, without having to write your own class?

What is the best way to avoid duplicate code when creating nested dictionaries with common keys?

OnStrike
  • 748
  • 1
  • 6
  • 22
  • 1
    why would you worry about duplicate keys in separate dicts? – Padraic Cunningham Oct 30 '14 at 22:54
  • To expand on the above, you don't need to worry about the (memory use of) duplicate string keys in the nested dicts. Python stores the string once in memory and then uses pointers/references to the same string. – Tom Dalton Oct 30 '14 at 22:57
  • Each sub-dictionary I create has 15 identical keys. In order to define the parent-dictionary, I end up copying and pasting a TON. It feels wrong to do that but I am not aware of a good alternative or best practices.. – OnStrike Oct 30 '14 at 22:57
  • 1
    If you're looking at it more from a performance standpoint, then namedtuples https://docs.python.org/2/library/collections.html#collections.namedtuple might be what you're after. If you can explain more about why you don't think the dictionaries are a good solution, it will help guide you to the solution more appropriate to your situation. – Tom Dalton Oct 30 '14 at 22:58
  • So is the issue that you are defining all the data manually in python files? So you are more interested in other storage types, like flat or structured files, a database, etc? – Tom Dalton Oct 30 '14 at 22:59
  • Yes, thats correct. I am creating a fairly large parent dictionary with nested dictionaries manually in a module to reference in other modules. Each nested dictionary contains configuration settings and parameters unique to its key. – OnStrike Oct 30 '14 at 23:03
  • 1
    What you want is a named tuple. They're just like your little class, except they include more functionality and require less code to define. – ArtOfWarfare Oct 30 '14 at 23:13
  • Thanks for you input: I have edited the question to reflect you comments. I am not too familiar with NamedTuple. I understand its definition but am uncertain how to implement it as an answer to the question. – OnStrike Oct 30 '14 at 23:24
  • Where have you heard repeated arguments against object-oriented code? Python is a deeply object-oriented language - everything within the language is an object. What's led you to think that you're better off using convoluted combinations of built-in types rather than creating your own types? – furkle Oct 31 '14 at 00:08
  • I didnt mean to say that OO is bad/wrong, I increasingly find myself creating more classes. My classes are frequently very small- that feels like a misuse of class. I have read pieces like "http://kishorelive.com/2012/03/18/the-real-problem-with-oo-is-taking-it-too-far/" that give me pause, espcially when I am writing smaller projects... – OnStrike Oct 31 '14 at 00:29
  • @theNamesCross you should not add your own answer in the question. Instead accept the answer that tells you to use `namedtuple` or create your own answer. Now you have accepted a completely different answer from the one you describe as the best in your question – Joakim Nov 17 '16 at 10:34
  • @Joakim Thanks for your interest, fixed - I was unaware of SE etiquette when I originally posted, my apologies. – OnStrike Dec 01 '16 at 00:06

3 Answers3

6

It sounds like you have a matrix of data, since every "row" has the same keys (columns), so I'd use a NumPy array:

import numpy as np

dtype = [('name', object), ('address', object), ('famous', bool)]
people = np.array([
        ("Charles Lindberg", "123 St.", True),
        ("Me", "456 St.", False),
        ], dtype)

charlie = people[people['name'] == 'Charles Lindberg'][0]
print charlie['address']

Or using Pandas which is more high-level:

import pandas as pd
people = pd.DataFrame(people_dict)
print people['Charles Lindberg']['address']

That loads your original dict-of-dicts people_dict straight into the matrix very easily, and gives you similar lookup features.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 1
    I recommend this approach, as I often find myself in the same situation. Although, I fail to see how people_dict is duplicative? I think you are correct in defining a Person class, and looking it up through key access is a standard, acceptable way to do it. Namedtuples are more lightweight than python objects, but aren't going to really save you too much in terms of design effort. You can also make people_dict into a class that behaves like a dic with more custom functionality. – Adam Hughes Oct 31 '14 at 01:03
  • 1
    @AdamHughes Thats a helpful breakdown- there is definitely a tradeoff between the light weight nature of NamedTuple vs a class object. For the specific usage defined in the question, I think NamedTuple is sufficient, plus it supports tuple methods. Why do you support the numpy approach over NamedTuple? – OnStrike Oct 31 '14 at 02:02
  • I support pandas because in my applications, I almost always need to build on such a framework, and eventually I often need functionality that's not available through the basic container types. It depends on the project, I guess. Also, I really an advocate for pandas, which is numpy-based, but wouldn't recommend a pure numpy approach, just to be clear to the OP. – Adam Hughes Oct 31 '14 at 15:41
  • Just to be clear, from the two approaches John pointed out, I'd only bother with the Pandas approach. Pretty much numpy operations can be done in Pandas really, so I'd avoid using numpy directly. – Adam Hughes Oct 31 '14 at 18:03
2

If you want a dict where all the values are dicts with the same or similar keys, you can define a function that takes the values and returns one of the inner dicts.

def hash(address, famous): return {"address": address, "famous": famous}

people_dict = {
    "Charles Lindberg": hash("123 St.", true),
    "Me": hash("456 St.", false)
}
Carl Smith
  • 3,025
  • 24
  • 36
1

First, You can read link above to take more information about nametuples: https://docs.python.org/2/library/collections.html#collections.namedtuple

NamedTuples can help you to avoid "duplicate code". You can create a namedtuple for address and use it to define.
I , particularly, prefer Object. OO has better solution for this problem. You can create a method do export object to a dict.

For functional paradigm is better use lists, array and dict, because there are a lot of methods/functions to help with this structs (map, reduce, etc) . If you no pretend to use some functional in your app, go to OO (object oriented) soluction.

Regards Andre

Andre Fonseca
  • 356
  • 3
  • 4