0

I am working with python3 and am just starting to learn about dataclass

I am trying to create a dataclass having an attribute that is list of itself.

Something like:

@dataclass
class Directory:
    name: str = field(default_factory=generate_randomly)
    num_of_files: int = 0
    ...
    subdirectories: List[Directory] = []

What I am struggling with is how to define the subdirectories attribute which is a List of Directory itself

If I try this

dir1 = Directory('folder1')
dir2 = Directory('folder2')
dir = Directory(subfolders=[dir1, dir2])

Traceback (most recent call last):
  File "main.py", line 14, in <module>
    class Directory:
  File "main.py", line 17, in Directory
    subfolders: List(Directory) = []
NameError: name 'Directory' is not defined

I saw one post here but that doesn't look like what I need

mittal
  • 915
  • 10
  • 29

1 Answers1

0

Seems like a good start to me so far, though you have a few minor typos:

  1. Change def to class, since you're creating a class - dataclasses are just regular Python classes.
  2. For forward references - in this case Directory is not yet defined - wrap the type in single or double quotes ' - so it becomes a string, and thus is lazy evaluated.
  3. Use dataclasses.field() with a default_factory argument for mutable types like list, dict, and set.

Example code putting it all together:

import random
import string
from dataclasses import field, dataclass
from typing import List


def generate_randomly():
    return ''.join(random.choice(string.ascii_letters) for _ in range(15))


@dataclass
class Directory:
    name: str = field(default_factory=generate_randomly)
    num_of_files: int = 0
    subdirectories: List['Directory'] = field(default_factory=list)


print(Directory())

In Python 3.7+, you can use a __future__ import so that all annotations are forward-declared (converted to strings) by default. This can simplify logic so you don't need single quotes, or even an import from typing module.

from __future__ import annotations

from dataclasses import field, dataclass


@dataclass
class Directory:
    name: str = field(default_factory=generate_randomly)
    num_of_files: int = 0
    subdirectories: list[Directory] = field(default_factory=list)

To validate that each element in a subdirectory is actually Directory type, since dataclasses doesn't automatically handle this, you can add logic in __post_init__() to achieve this:

    def __post_init__(self):
        for dir in self.subdirectories:
            if not isinstance(dir, Directory):
                raise TypeError(f'{dir}: invalid type ({type(dir)}) for subdirectory')
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • Thanks. I believe I also lack the understanding of annotations :( Is `list[Directory]` just a hint for documentation or it will actually fail the list contains any other data type – mittal Aug 04 '22 at 14:12
  • `list[Directory]` or rather `list[T]` (where T is any type) is actually valid I think from python 3.9+, even without the `__future__` import. See [PEP 585](https://peps.python.org/pep-0585/) for more details of when this was introduced. – rv.kvetch Aug 04 '22 at 14:13
  • @mittal if I'm understanding correctly, it's just a type hint and only IDEs such as pycharm or type checkers such as mypy will use it to help you catch errors, and complain when you pass an invalid type as an argument. however it is not enforced at runtime, so your program won't crash if you pass an invalid type. – rv.kvetch Aug 04 '22 at 14:15
  • Ah, then I should check if each element of the list `is_dataclass` in `__post_init__()` ? To check if it is actually an object of type `Directory` `isinstance()` will suffice ? – mittal Aug 04 '22 at 14:30
  • hmm.. can you explain a bit more on your use case? It sounds like you might be getting a dict or other kind of objects for `subdirectories`, and you want to ensure that each are a `Directory` type. But to answer your question, `isinstance()` in post init should be sufficient. I updated with a sample implementation above. – rv.kvetch Aug 04 '22 at 14:44
  • 1
    The caller will define the desired `Directory` hierarchy where they define permissions, name, numer of files and subfolders at each level. The subfolder itself will be a list of `Directory` so it will go down recursively all the way upto the leaf Directory. – mittal Aug 04 '22 at 17:03