5

I have a dataclass and I want to iterate over in in a loop to spit out each of the values. I'm able to write a very short __iter__() within it easy enough, but is that what I should be doing? I don't see anything in the documentation about an 'iterable' parameter or anything, but I just feel like there ought to be...

Here is what I have which, again, works fine.

from dataclasses import dataclass

@dataclass
class MyDataClass:
    a: float
    b: float
    c: float

    def __iter__(self):
        for value in self.__dict__.values():
            yield value

thing = MyDataclass(1,2,3)
for i in thing:
    print(i)
# outputs 1,2,3 on separate lines, as expected

Is this the best / most direct way to do this?

scotscotmcc
  • 2,719
  • 1
  • 6
  • 29
  • Just to be clear, it's not a great idea to implement this in terms of `self.__dict__` (at least for drop-in code that's supposed to work with any dataclass). The main reason being that if `__slots__` is defined manually or (3.10+) the decorator uses `@dataclass(slots=True)` (at any layer in the inheritance hierarchy) to make a slotted dataclass (gives dramatically lower memory overhead per-instance than an unslotted one, at the cost of not allowing autovivified attributes or weak references by default), then the slotted attributes won't appear in `__dict__`; `__dict__` might not even exist. – ShadowRanger Nov 10 '22 at 19:19
  • A more minor objection is that it's usually a good idea to minimize use of special names when you can, in favor of using the other mechanisms that grant you access to them implicitly; you don't write `a = 1`, `b = 2`, `c = a.__add__(b)`, you write `c = a + b`, and similarly, `vars(self)` would give you the underlying `__dict__` in the rare cases you need it. That's a more debatable style suggestion though (though it has some behavior implications in weird cases, where an instance overrides a special name on the class, and the implicit approach efficiently/correctly bypasses the instance). – ShadowRanger Nov 10 '22 at 19:21

2 Answers2

10

The simplest approach is probably to make a iteratively extract the fields following the guidance in the dataclasses.astuple function for creating a shallow copy, just omitting the call to tuple (to leave it a generator expression, which is a legal iterator for __iter__ to return:

def __iter__(self):
    return (getattr(self, field.name) for field in dataclasses.fields(self))

# Or writing it directly as a generator itself instead of returning a genexpr:
def __iter__(self):
    for field in dataclasses.fields(self):
        yield getattr(self, field.name)

Unfortunately, astuple itself is not suitable (as it recurses, unpacking nested dataclasses and structures), while asdict (followed by a .values() call on the result), while suitable, involves eagerly constructing a temporary dict and recursively copying the contents, which is relatively heavyweight (memory-wise and CPU-wise); better to avoid unnecessary O(n) eager work.

asdict would be suitable if you want/need to avoid using live views (if later attributes of the instance are replaced/modified midway through iterating, asdict wouldn't change, since it actually guarantees they're deep copied up-front, while the genexpr would reflect the newer values when you reached them). The implementation using asdict is even simpler (if slower, due to the eager pre-deep copy):

def __iter__(self):
    yield from dataclasses.asdict(self).values()

# or avoiding a generator function:
def __iter__(self):
    return iter(dataclasses.asdict(self).values())

There is a third option, which is to ditch dataclasses entirely. If you're okay with making your class behave like an immutable sequence, then you get iterability for free by making it a typing.NamedTuple (or the older, less flexible collections.namedtuple) instead, e.g.:

from typing import NamedTuple

class MyNotADataClass(NamedTuple):
    a: float
    b: float
    c: float

thing = MyNotADataClass(1,2,3)
for i in thing:
    print(i)
# outputs 1,2,3 on separate lines, as expected

and that is iterable automatically (you can also call len on it, index it, or slice it, because it's an actual subclass of tuple with all the tuple behaviors, it just also exposes its contents via named properties as well).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
2

Just use dataclasses.asdict to get a dictionary.

In [28]: from dataclasses import asdict
In [29]: [v for v in asdict(MyDataClass(1, 2, 3)).values()]
Out[29]: [1, 2, 3]

Then you can also access the attributes if you use .items().

In [30]: [(k, v) for k, v in asdict(MyDataClass(1, 2, 3)).items()]
Out[30]: [('a', 1), ('b', 2), ('c', 3)]
suvayu
  • 4,271
  • 2
  • 29
  • 35
  • @SilvioMayolo I expanded it, is it better? Would you add something else? Thanks for your comment :) – suvayu Nov 10 '22 at 18:57
  • It might help to demonstrate how you'd implement `__iter__` itself (in practice, you wouldn't actually want a listcomp in either case; no-op listcomps, if needed, should just be `list()` constructor calls, and in this case, implementing `__iter__` with `yield from asdict(MyDataClass(1, 2, 3)).values()` would avoid the need for yet another temporary. – ShadowRanger Nov 10 '22 at 19:00
  • @ShadowRanger Why does one need to implement `__iter__` when directly using `asdict` does the job? It's also very explicit, no hidden magic what's going on. Dataclasses are essentially like structs, no need to complicate things. – suvayu Nov 10 '22 at 19:04
  • The OP specifically asked how to make it possible to do `for item in instance_of_mydataclass:`; you *could* use `asdict` in every place, but if the class is logically iterable, it's nice to people using it to actually make it iterable. – ShadowRanger Nov 10 '22 at 19:06
  • The OP asked "I have a dataclass and I want to iterate over in in a loop to spit out each of the values", dataclasses are not containers. Semantically they are not meant to be iterated. If that's the requirement, OP should use something else, a dictionary or list perhaps. – suvayu Nov 10 '22 at 19:09
  • Dataclasses are not semantically *anything* but an easy way to generate a lot of boilerplate code for common class patterns. If a *particular* dataclass is supposed to be iterable (while others are not), defining `__iter__` on it makes sense. They're not "like structs", they're exactly like any other class, you're just saved the trouble of typing so much (including sometimes sprouting getting bonus features, like `match` compatibility, as the code generation can be updated in new version of Python). There's no moral distinction between iterable regular classes and iterable dataclasses. – ShadowRanger Nov 10 '22 at 19:13
  • from [PEP 557](https://peps.python.org/pep-0557/): "Data Classes can be thought of as “mutable namedtuples with defaults”. So they are indeed like structs (or records). You are right that code generation is extra, but that's just convenience for comparison, or if you want something hashable. – suvayu Nov 10 '22 at 19:23
  • Sigh... Read the *very next sentence* after your quote: "Because Data Classes use normal class definition syntax, you are free to use inheritance, metaclasses, docstrings, user-defined methods, class factories, and other Python class features." The module provides direct support for those simple record features, but that does not mean there is any reason *not* to use it when you *also* want a full-fledged class with additional behaviors. There is no law or custom that forbids making *every* class a data class, just to save typing, even if they're way more than simple struct/records. – ShadowRanger Nov 10 '22 at 19:28
  • No one is preventing anyone to do anything. However I made a very specific comment, that if iteration is a primary requirement, maybe there are better alternatives. Also the OP seemed to have been stuck on how to get the keys/values so pointing to `asdict` was sufficient to answer the question. I don't see why we are really have this long winded discussion about `__iter__` when the OP already knows how to extend a class to support iteration. – suvayu Nov 10 '22 at 19:35