3

I have a nested structure read from YAML which is composed of nested lists and/or nested dicts or a mix of both at various levels of nesting. It can be assumed that the structure doesn't contain any recursive objects.

How do I extract from it the leaf values only? Also, I don't want any None value. The leaf values contain strings which is all I care for. It's okay for recursion to be used, considering that the maximum depth of the structure is not large enough to exceed stack recursion limits. A generator would optionally also be fine.

There exist similar questions which deal with flattening lists or dicts, but not a mix of both. Alternatively, if flattening a dict, they also return the flattened keys which I don't really need, and risk name conflicts.

I tried more_itertools.collapse but its examples only show it to work with nested lists, and not with a mix of dicts and lists.

Sample inputs

struct1 = {
    "k0": None,
    "k1": "v1",
    "k2": ["v0", None, "v1"],
    "k3": ["v0", ["v1", "v2", None, ["v3"], ["v4", "v5"], []]],
    "k4": {"k0": None},
    "k5": {"k1": {"k2": {"k3": "v3", "k4": "v6"}, "k4": {}}},
    "k6": [{}, {"k1": "v7"}, {"k2": "v8", "k3": "v9", "k4": {"k5": {"k6": "v10"}, "k7": {}}}],
    "k7": {
        "k0": [],
        "k1": ["v11"],
        "k2": ["v12", "v13"],
        "k3": ["v14", ["v15"]],
        "k4": [["v16"], ["v17"]],
        "k5": ["v18", ["v19", "v20", ["v21", "v22", []]]],
    },
}

struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]

Expected outputs

struct1_leaves = {f"v{i}" for i in range(23)}
struct2_leaves = {f"{s}{s}" for s in "abcdefg"}
Asclepius
  • 57,944
  • 17
  • 167
  • 143

3 Answers3

2

Another possibility is to use a generator with recursion:

struct1 = {'k0': None, 'k1': 'v1', 'k2': ['v0', None, 'v1'], 'k3': ['v0', ['v1', 'v2', None, ['v3'], ['v4', 'v5'], []]], 'k4': {'k0': None}, 'k5': {'k1': {'k2': {'k3': 'v3', 'k4': 'v6'}, 'k4': {}}}, 'k6': [{}, {'k1': 'v7'}, {'k2': 'v8', 'k3': 'v9', 'k4': {'k5': {'k6': 'v10'}, 'k7': {}}}], 'k7': {'k0': [], 'k1': ['v11'], 'k2': ['v12', 'v13'], 'k3': ['v14', ['v15']], 'k4': [['v16'], ['v17']], 'k5': ['v18', ['v19', 'v20', ['v21', 'v22', []]]]}}
def flatten(d):
   for i in getattr(d, 'values', lambda :d)():
      if isinstance(i, str):
         yield i
      elif i is not None:
         yield from flatten(i)

print(set(flatten(struct1)))

Output:

{'v10', 'v9', 'v8', 'v7', 'v0', 'v18', 'v16', 'v1', 'v21', 'v11', 'v14', 'v15', 'v12', 'v13', 'v4', 'v2', 'v5', 'v20', 'v6', 'v19', 'v3', 'v22', 'v17'}

struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
print(set(flatten(struct2)))

Output:

{'cc', 'ff', 'dd', 'gg', 'bb', 'ee', 'aa'}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
2

This is an adaption of the reference answer to use an inner function and a single set. It also uses recursion to produce the expected outputs for the sample inputs included in the question. It avoids passing every leaf through the entire call stack.

from typing import Any, Set


def leaves(struct: Any) -> Set[Any]:
    """Return a set of leaf values found in nested dicts and lists excluding None values."""
    # Ref: https://stackoverflow.com/a/59832594/
    values = set()

    def add_leaves(struct_: Any) -> None:
        if isinstance(struct_, dict):
            for sub_struct in struct_.values():
                add_leaves(sub_struct)
        elif isinstance(struct_, list):
            for sub_struct in struct_:
                add_leaves(sub_struct)
        elif struct_ is not None:
            values.add(struct_)

    add_leaves(struct)
    return values
Asclepius
  • 57,944
  • 17
  • 167
  • 143
1

This is a straightforward reference solution which uses recursion to produce the expected outputs for the sample inputs included in the question.

from typing import Any, Set


def leaves(struct: Any) -> Set[Any]:
    """Return a set of leaf values found in nested dicts and lists excluding None values."""
    # Ref: https://stackoverflow.com/a/59832362/
    values = set()
    if isinstance(struct, dict):
        for sub_struct in struct.values():
            values.update(leaves(sub_struct))
    elif isinstance(struct, list):
        for sub_struct in struct:
            values.update(leaves(sub_struct))
    elif struct is not None:
        values.add(struct)
    return values
Asclepius
  • 57,944
  • 17
  • 167
  • 143
  • Seems inefficient to pass every leaf through the entire call stack, updating it into sets over and over again. – Kelly Bundy Jan 21 '20 at 00:01
  • 1
    You could have a recursive inner function that doesn't return anything but adds everything to the set held by the outer function (non-recursive, just calls the inner). – Kelly Bundy Jan 21 '20 at 00:04