2

I am trying to modify my project to be Pylance compliant and I am having the following issue:

Lets say that I have a function of the form:

def foo(a: int) -> int | list[int]:
    if a > 0:
        return a
    else:
        return [a]

Then in the code when I call len(foo(-2)), Pylance is giving me an error. What is the best way to handle it?

If this helps, here is the real function:

    def read(self, path: Optional[str] = None) -> list[str] | h5py.Datatype | npt.ArrayLike :
        """read
        Wrapper around the __getitem__ of h5py. Directly returns the keys of the sub-groups if 
        the path lead to an h5py.Group, otherwise directly load the dataset.
        
        This allows to get a list of keys to the folders without calling .keys(), 
        and to the data without [()] therefore the way to call the keys or the data are the same.
        And therefore the user does not need to change the call between .keys() and [()] to navigate 
        the hierarchical structure.

        Parameters
        ----------
        path : Optional[str], optional
            Path to the Group or Dataset you want to read. 
            If the value is None, read the root of the folder:
            (should be [datasets, metadata, process] if created with Hystorian), by default None

        Returns
        -------
        list[str] | h5py.Datatype | npt.ArrayLike
            If the path lead to Groups, will return a list of the subgroups, 
            if it lead to a Dataset containing data, it will directly return the data, 
            and if it is an empty Dataset, will return its Datatype.

        """
        if path is None:
            return list(self.file.keys())
        else:
            current = self.file[path]
            if isinstance(current, h5py.Group):
                return list(current.keys())
            if isinstance(current, h5py.Datatype):
                return current
            else:
                return current[()]

This function is part of a context manager class which takes an h5py.File as attribute self.file, and add extra functions to it. (Like this read function instead of using the __getitem__ implemented by h5py, which also returns different type of object depending of the path: either an h5py.Group, h5py.Datatype or h5py.Dataset.

Therefore when I call f.read() it returns a list of the h5py.Group in the root ['datasets', 'metadata', 'process], but if I call f.read('datasets/values') and values is an h5py.Dataset it will directly return the data.

The solution that I might see so far is that for each call of the function read I should check the return type. However since the internal of the function is already doing type checking, this seem not ideal.

Another solution would be to use # type : ignore but this seems counter productive with implementing pylance in the first place.

What I was thinking is to create three internal functions _readkey, _readdata and _readtype that are then called by read, which would be the function called by the API user, but internally I would call the three internal functions. But this also seem to lose a bit the point of read.

CoilM
  • 23
  • 4

2 Answers2

1

If the result deppends ond the input you could resolve this with the @overload decorator...

Instead of using the self.file[path] add current to the function signature and add the three signatures.

@overload
def read(self) -> List[str]:
    pass

@overload
def read(self, path: str) -> List[str]:
    pass

@overload
def read(self, current: h5py.Group) -> List[str]:
    pass

@overload
def read(self, current: h5py.Datatype) -> h5py.Datatype:
    pass

@overload
def read(self, current: npt.ArrayLike) -> npt.ArrayLike:
    pass

def read(self, path: Optional[str] = None, current: Optional[Union[h5py.Group, h5py.Datatype, npt.ArrayLike]]) -> Union[List[str], h5py.Datatype, npt.ArrayLike]:
   ...

Mixing returns is a bad idea that makes the code harder to understand...

For example print(len(foo(2))) raises a TypeError...

If you want to use len(foo(-2)), you would have to check that the return is a list...

res = foo(-2)
if isinstance(res, list):
    print(len(res))
0

This is what typing.overload is for. You can define multiple function call signatures that are distinct in their argument types (and optionally their return types). Example:

from typing import Optional, overload


@overload
def read(path: str) -> str: ...


@overload
def read(path: None = None) -> list[str]: ...


def read(path: Optional[str] = None) -> list[str] | str:
    if path is None:
        return ["foo"]
    return "bar"

Try it with the type checker of your choice:

from typing_extensions import reveal_type


reveal_type(read())       # "builtins.list[builtins.str]"
reveal_type(read(None))   # "builtins.list[builtins.str]"
reveal_type(read("abc"))  # "builtins.str"
Daniil Fajnberg
  • 12,753
  • 2
  • 10
  • 41