-1

In python, I have a function to read bytes from a file:

def read_bytes(path: pathlib.Path, tohex: bool = True, hexformat: Optional[Callable] = None) -> bytes | hex | Any:

In this function, hex is whether to convert the bytes to hex. hexformat is a callable which formats the hex.
For example:

read_bytes(pathlib.Path("myfile.txt"), tohex=True, hexformat = str.upper)

Yes, this function does more than one thing, which is bad practice. However, this function is purely theoretical. I was just trying to come up with an easy example of a function with two arguments that must exist together.

Logically, you cannot pass one of these arguments but not the other. You must pass both or neither. So, if this is the case, I want to raise an error:

def read_bytes(path: pathlib.Path, hex: bool = True, hexformat: Optional[Callable] = None) -> bytes | hex | Any:
    if hex and hexformat is not None:
        raise TypeError("hex and hexformat are ___")

However, I don't know what word to use (I put ___ as a placeholder). What is the terminology for this?


Edit:

I have another problem with this concept: If one of the parameters is a boolean has a default, how should I specify it in the signature?

For example, say I replace hexformat with toupper. toupper is a bool and it defaults to False. Should I specify that like this:

def read_bytes(path: pathlib.Path, tohex: bool = True, toupper: bool = False) -> bytes | hex | Any:

or like this:

def read_bytes(path: pathlib.Path, tohex: bool = True, toupper: bool = None) -> bytes | hex | Any:
    if toupper is None:
         toupper = False

In the former, I cannot check if the caller explicitly passed in toupper but set tohex to False, and raise an error if this is the case (since toupper has a default). On the other hand, the latter is more verbose.

Which is better?

  • 1
    "mutually dependent", maybe. But if you're using type annotations, I think it'd be better to express that via the function signature(s) anyway than via a potentially confusing exception message. Why do you need these to be two arguments? They convey redundant information -- you could just omit `hex` and infer that if `hexformat` is provided, the bytes should be converted to hex (using that formatting function). – Samwise May 14 '23 at 23:04
  • @Samwise But what if I don't want any formatting to be done? And remember that this function is only theoretical; I conceived it to express my question. – Zach Joseph May 14 '23 at 23:16
  • 1
    You said in your question that *both* must be passed or *neither* -- so according to that logic, it is *not permitted to ask for hex but not provide a formatter*. In which case there's no point in having a separate flag that indicates you're asking for hex. Providing the hex formatter means you want hex and you want it formatted; not passing a formatter means you don't want hex. (This doesn't make sense to me, but it's your example!) – Samwise May 14 '23 at 23:33
  • Since you're defaulting `hex` to True, I don't think there's an easy way (or perhaps any way at all) to tell if it was passed. – John Gordon May 14 '23 at 23:49
  • @Samwise Sorry, you're right. I made a mistake. I was thinking about two bools, and I don't know how I came up with that crazy nonsense. The example in the edit makes more sense. – Zach Joseph May 15 '23 at 00:17
  • @Samwise Sorry again, disregard that comment. I'm getting confused, and my brain's hurting from thinking about this. Just disregard everything - until I can wrap my head around this and fix my question, there's no point in it. – Zach Joseph May 15 '23 at 00:20

1 Answers1

2

In general, when different parameters are dependent on each other as you're describing, my tendency is to combine them so that mutually incompatible combinations are simply not possible within the signature of the function.

For example, I might write:

def read_bytes(path: pathlib.Path, tohex: bool = True, toupper: bool = False) -> bytes | hex | Any:

as:

class ByteFormat(Enum):
    BYTES = auto()
    HEX_LOWER = auto()
    HEX_UPPER = auto()

def read_bytes(path: pathlib.Path, format: ByteFormat) -> bytes | str:

Since there are logically three ways to format the output, it's much more straightforward to express that as an Enum with three values than two bools (which have four possible combinations) or worse, two optional bools (which have nine possible combinations, only three of which will be considered valid).

Another option is to use typing.overload to enumerate the possible combinations. This is more complicated, but it has the benefit that if the return type depends on the argument type, it's possible to express that:

from typing import overload, Literal

@overload
def read_bytes(path: pathlib.Path, tohex: Literal[False]) -> bytes: ...

@overload
def read_bytes(path: pathlib.Path, tohex: Literal[True], toupper: bool) -> str: ...

def read_bytes(path: pathlib.Path, tohex: bool=True, toupper: bool=None) -> bytes | str:
    # actual implementation goes here

When you use a static type checker, calls to the function are checked against the @overloads and you get an error if the call doesn't match any of them:

read_bytes(pathlib.Path(), False)  # ok
read_bytes(pathlib.Path(), True, False)  # ok
read_bytes(pathlib.Path(), True)   # error!
# No overload variant of "read_bytes" matches argument types "Path", "bool"
# Possible overload variants:
#     def read_bytes(path: Path, tohex: Literal[False]) -> bytes
#     def read_bytes(path: Path, tohex: Literal[True], toupper: bool) -> str
Samwise
  • 68,105
  • 3
  • 30
  • 44