0

I am wondering if there is a mechanism in python to fetch the "public" path to a class object in python (PEP8, __all__, "explicit reexport", ...).

To provide a minimal example, let's assume the following package structure:

|-- my_package
|   |-- __init__.py
|   |-- _my_module.py

Let this be the file contents of my_package.__my_module.py:

class MyObject: ...

Let this be the file contents of my_package.__init__.py:

from ._my_module import MyObject

__all__ = [
    "MyObject"
]

Is there a 'canonical' way of fetching the public path to MyObject - which should be my_package.MyObject (rather than my_package._my_module.MyObject)?

The following snippet works in the above example, but obviously bears a high risk of error (especially if you might want to use the returned string to do a programmatic import later on)

from typing import Any
from typing import Type


def get_public_path(to: Type[Any]) -> str:
    module = ".".join(list(filter(lambda k: not str(k).startswith("_"), to.__module__.split("."))))
    name = to.__qualname__
    return ".".join([module, name])


if __name__ == '__main__':
    from my_package import MyObject
    print(get_public_path(MyObject))

The example is a pretty common pattern, where the exact code location of MyObejct is considered an internal implementation detail of the package, which may change at any time without further note. Users of the package must import the object via from my_package import MyObject, which is considered the public api.

I would hope, that there is a clean and obvious way of getting the "public" class path without a huge bunch of meta-programming inspection (e.g. iterating the "private" module path, recursively looking for __all__ sections inside the module path etc.)

ddluke
  • 1
  • 2
  • Since you can reexport from multiple places, there may be many paths to an object. There’s no specification for what the canonical path should be. That’s left up to you and proper documentation and/or hinting that certain paths are *not* to be used by making them “protected” with an underscore. – deceze Jun 16 '23 at 05:24
  • That is true obviously, but I'm not necessarily interested in only fetching the class paths to objects in a package under my control. Think of a generic way of getting such details of any arbitrary installed python package (Standard Library, PyPI (pandas, airflow, ...), ...). Perhaps there is just no clean way of achieving this goal? But perhaps I'm also overseeing something here (which I hope is the case) – ddluke Jun 16 '23 at 05:49
  • If that is of help, the real world use case behind this question is a small utility of a code generator, which gives me the ast repr of an import statement for this class. If there is no generic way of achieving this, I might be forced to somehow pin the information (or map it in some sort of configuration) – ddluke Jun 16 '23 at 05:58
  • Yeah, I don't think there's a specified way to do this. I'd say typically the canonical path would be the shortest one available. I.e. if you could `from foo import bar` or `from foo.baz import bar`, then it's probably the former you want. But you can't really discover this without examining all parent modules of the class. – deceze Jun 16 '23 at 06:12

1 Answers1

0

Presumably, and in the lack of a clearly specified standard (there are potentially multiple paths to a class), there is no one obvious way to achieve this.

One clean way of achieving this would be to pin this information, e.g. in form of an enum.Enum:

from enum import Enum


class KnownTypes(str, Enum):
    MyObject = "my_package.MyObject"

And add good test coverage

import pytest
import importlib


@pytest.mark.parametrize(argnames="known_type", argvalues=KnownTypes)
def test_types(known_type: KnownTypes) -> None:
    # note that this doesn't work for class objects buried in another 
    class_module, class_name = self.value.rsplit(".", maxsplit=1)
    class_object = getattr(importlib.import_module(class_module), class_name)

If that grows out of hand, e.g. because the amount of types simply becomes too large to maintain, a reflection approach might be used as well, where a back reference to the original import statement (from my_package import MyObject) is used to generate the class path

ddluke
  • 1
  • 2