Data Classes vs Dictionaries

Question

I've been learning about dataclasses, and was reworking an old project trying to integrate a dataclass into the program in place of a dictionary system I was using. The code blocks below are essentially the respective new and old methods being used to build a dataframe of several thousand items. My problem is I don't understand the use-case for the dataclass over a dictionary.

What I want to know is:

When should I use a dataclass over a dictionary (or vice versa)?
Programmatically, in this instance of simply cataloguing data, is either method more efficient/optimized than the other?
In actual practice is either method encouraged over the other (for reasons of efficiency, readibility, industrial standards, or otherwise)?

Method using @dataclass

@dataclass
class Car:
    year: int = None
    model: str = None

def main():
    foo = {}
    for name in car_list:
        bar = Car()
        bar.year = get_year(name)
        bar.model = get_model(name)
        
        foo[name] = vars(bar)

    df = pd.DataFrame.from_dict(foo)

Method using Dictionary

def main():
    foo = {}
    for name in car_list:

        bar = {
            'year': None
            'model': None
        }

        bar['year'] = get_year(name)
        bar['model'] = get_model(name)
        
        foo[name] = bar

    df = pd.DataFrame.from_dict(foo)

there is no real reason to use a dataclass here, if you just want to ultimately create a pandas dataframe. A dictionary is a reasonable intermediate format for that — juanpa.arrivillaga, Oct 18 '22 at 23:18
"When should I use a dataclass over a dictionary (or vice versa)?" A dataclass definese a record type, a dictionary is a mapping type. Although dictionaries are often used like record types, those are two distinct use-cases. — juanpa.arrivillaga, Oct 18 '22 at 23:19
@IgnatiusReilly that is such an old question that doesn't really adhere to the current standards, the most up-voted answer is basically just someone's opinion — juanpa.arrivillaga, Oct 18 '22 at 23:20
A class decorated by @dataclass is just a class with a library defined __init__(). So, use the class if you need the OOP(methods, inheritances, etc). Your example code ALONE shows no merit of defining a class, but only a boilerplate. For converting into a DataFrame, I recommend DataFrame(data=...) than DataFrame.from_dict() if you need performance. — relent95, Oct 19 '22 at 01:45
Further, the accepted answer in the linked question has 22 downvotes apparently (now 23). I guess the reason why the answer is so controversial is probably because it’s just “that guy’s opinion”. Well that, and there is no code or really anything else substantial to it. — rv.kvetch, Oct 19 '22 at 03:19

score 2 · Accepted Answer · answered Oct 19 '22 at 06:22

As discussed in the comments, there is a lot of discussion (and opinions) regarding this particular comparison. After doing several hours of research, there are a few main points I'd like to lay out for anyone else who may have this question in the future.

1. In regard to efficiency

Dictionaries are simpler data containers and thus will be more efficient. Under the hood, classes and dataclasses are dictionaries with a bit more going on. The top answer on this SO post provides insight into how much more efficient dictaries are than dataclasses when undergoing various tasks. (Creating a container can be as 5x as slow, whereas accessing the data is only 1.25-1.2 times as slow). Various other accounts on the web demonstrate similar results.

2. Functional Differences

A major point aside from speed is the control over mutability. It's not impossible to make elements of a dictionary immutable, but generally requires the creation of classes, functions, or importing some library. Dataclasses on the other hand allow instances to be frozen after creation by simply passing frozen=True into the decorator of the dataclass. On top of the obvious changing of values, this functionality also prevents any attributes from being added, accidentally or otherwise.

Other decorator arguments provide potentially even finer control over class creation. This video is an excellent, beginner friendly resource that demonstrates several attributes and usecases of dataclasses.

Type hints are another reason one might prefer to use dataclasses. While type hints can be utilized with dictionaries and their values, type hinting an object may result in finer control. This is a great write-up on Medium about a team who refactored a project to use dataclasses instead of dictionaries.

3. Which one is right for me?

I've spent the better half of an afternoon learning why it's hard to find an answer to this question. Because it depends. If one was objectively better than the other, the other would have been depreciated. Dictionaries are simpler- they require no imports, can be created, accessed and mutated with ease, and will produce faster results. Dataclasses on the other hand allow the user finer control. This can be especially important when working on large projects with several team members.

A general heuristic when designing a program is to defer to the simplest structure when possible. In my particular case, I don't need the additional functionality of dataclasses when my goal is simply to create a dataframe, and my input data is somewhat reliable. Using a data class doesn't noticeably slow my code down, but if I were to take on several more inputs, I might see performance take a hit.

Data Classes vs Dictionaries

1 Answers1