3

I have the following function:

from lxml import etree
from typing import Union


def _get_inner_xml(element: Union[etree._Element, None]) -> Union[str, None]:
    if element is None:
        return None
        
    # See https://stackoverflow.com/a/51124963
    return (str(element.text or "") + "".join(etree.tostring(child, encoding="unicode") for child in element)).strip()


root = etree.fromstring('<html><body>TEXT<br/>TAIL</body></html>')
innerXML = _get_inner_xml(root)
print(innerXML)

My understanding of it is that if I pass None as an argument, I always get None as a return value. On the other hand, an etree._Element as argument will always result in a str return.

If I write the following in vscode using pylance (it uses pyright under the hood):

def test(element: etree._Element):
    variable = _get_inner_xml(element)

In this case I get the type hint (variable) variable: str | None. I would expect pylance to know that variable should be of the type str. Am I overseeing something? Is this maybe a bug?

If this works as intended: Is there a possibility to manually tell pylance that "whenever this function gets a etree._Element it will return a str and whenever I pass None it returns None"?

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
T-Dawg
  • 83
  • 6

2 Answers2

2

The answer here is to use typing.overload (documentation here), which allows you to register multiple different signatures for one function. Function definitions decorated with @overload are ignored at runtime — they are just for the type-checker — so the body of the function can be filled with a literal ellipsis ..., pass, or just a docstring. You also need to make sure you provide a "concrete" implementation of the function that doesn't use @overload.

from lxml import etree
from typing import Union, overload

@overload
def _get_inner_xml(element: etree._Element) -> str: 
    """Signature when `element` is of type `etree._Element`"""

@overload
def _get_inner_xml(element: None) -> None: ...
    """Signature when `element` is of type `None`"""

def _get_inner_xml(element: Union[etree._Element, None]) -> Union[str, None]:
    if element is None:
        return None
        
    # See https://stackoverflow.com/a/51124963
    return (str(element.text or "") + "".join(etree.tostring(child, encoding="unicode") for child in element)).strip()


root = etree.fromstring('<html><body>TEXT<br/>TAIL</body></html>')
innerXML = _get_inner_xml(root)
print(innerXML)
Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
0

This is not how type hinting works. To know that an input of etree._Element always results in a return of etree._Element and an input of None always results in None the IDE would need to parse the function, analyse all paths and get to that result.

I highly doubt that it is build to do that. Instead the IDE simply parses for annotations in the signatures and returns them as hint - type hints are just that - they are not enforced on code execution.

You may want to check with a simpler function:

# this will either return a None or a str - it simply returns what is inputted
def test(element: Union[str, None]) -> Union[str, None]:
    return element


should_be_str = test("should be a str as type hint return")
should_be_none = test(None) 

should_be_marked_as_type_mismatch = test(42) # works from the signature information

and see if your IDE picks that one up - I seriously doubt it.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • You are right, I get the same behavior with the function you sent. What is the best way to work with such a function in typed code? Explicitly cast the return value every time the function is called? – T-Dawg Aug 12 '21 at 12:47
  • This answer is correct, but could be more constructive, given that the problem is easily solved with `typing.overload` – Alex Waygood Aug 12 '21 at 13:05
  • 1
    @Alex thanks for pointing me to \@overload, did not encounter that one yet - good way to solve this. – Patrick Artner Aug 12 '21 at 18:59