12

I have the following code:

def extract_table_date(bucket_path: str) -> str:
    event_date = re.search(r"date=([^/]+)", bucket_path)
    return event_date.group(1)[0:10].replace("-", "")

mypy throws error on the last line:

Item "None" of "Optional[Match[str]]" has no attribute "group"

I think I can solve that by assigning a type to event_date, and I can:

from typing import Match

def extract_table_date(bucket_path: str) -> str:
    event_date: Match = re.search(r"date=([^/]+)", bucket_path)
    return event_date.group(1)[0:10].replace("-", "")

but mypy now throws another error on the first line of the function:

Incompatible types in assignment (expression has type "Optional[Match[Any]]", variable has type "Match[Any]")

I don't really know how to inform mypy that the result won't be optional but nonetheless I followed the advice at Optional types and the None type by adding an assert:

from typing import Match

def extract_table_date(bucket_path: str) -> str:
    assert bucket_path is not None
    event_date: Match = re.search(r"date=([^/]+)", bucket_path)
    return event_date.group(1)[0:10].replace("-", "")

but mypy still raises the same error.

I try to fix by changing the type defined for event_date:

from typing import Match, optional, Any

def extract_table_date(bucket_path: str) -> str:
    assert bucket_path is not None
    event_date: Optional[Match[Any]] = re.search(r"date=([^/]+)", bucket_path)
    return event_date.group(1)[0:10].replace("-", "")

but (as expected) I'm now back to almost the same original error:

Item "None" of "Optional[Match[Any]]" has no attribute "group"

Any advice on how to fix this?

Chiel
  • 1,865
  • 1
  • 11
  • 24
jamiet
  • 10,501
  • 14
  • 80
  • 159

2 Answers2

18

The thing that's Optional is event_date, because re.search is not guaranteed to return a match. mypy is warning you that this will raise an AttributeError if that's the case. You can tell it "no, I'm very confident that will not be the case" by doing an assert to that effect:

def extract_table_date(bucket_path: str) -> str:
    event_date = re.search(r"date=([^/]+)", bucket_path)
    assert event_date is not None
    return event_date.group(1)[0:10].replace("-", "")

If you're wrong, this code will still raise an exception (AssertionError, because your assert will fail), but mypy will no longer error because there is now no way for event_date to be None when you access its group attribute.

Note that there is no need to assert on bucket_path because it's already explicitly typed as str.

Samwise
  • 68,105
  • 3
  • 30
  • 44
  • 6
    Then you'd have to type the return as `Optional[str]`, which may not be what is desired. Returning `None` when a `str` is expected is likely going to lead to another exception somewhere higher up the stack -- if you have a programming error, it's better to fail fast and loud than slowly and/or silently since it gives you a more useful stack trace to work with when you're debugging. :) – Samwise Jul 21 '21 at 14:36
  • Another option may be to return an empty string. That way the result is consistent and document the meaning of the empty `str`. Though I prefer `Optional[str]` if this is a simple code. – astrochun Jul 21 '21 at 14:42
  • 2
    The "correct" thing to do here for an API that's being used by other people is likely to have it raise a specific exception (I'd probably use `ValueError`) if the string doesn't have the expected content, and document the interface as "returns str, raises ValueError if there was no date in bucket_path". (This is similar to how `int("foo")` raises ValueError because the value of its parameter isn't a valid number.) But for a private function where you're less worried about having a super clean API, `assert` is quick and easy because it accomplishes mostly the same thing with one line of code. – Samwise Jul 21 '21 at 14:46
  • 1
    We actually do raise an error if the value doesn't have the expected content but I left that out of the question for reasons of brevity – jamiet Jul 21 '21 at 15:12
  • 1
    @jamiet If you'd left that in I could have given you a better answer! :) I assume your code is doing a whole separate regex to catch that case -- instead I'd do that checking by way of `if event_date is not None: raise (whatever)` after your main `re.search` call. That should satisfy `mypy` without needing the extra `assert` and it provides a pretty strong guarantee that your error checking is catching all the cases. – Samwise Jul 21 '21 at 15:25
  • Unfortunately it didn't satisfy mypy. The error messages I provided above were all produced from the function does that check. With one (probably vital) difference from what you provided above, instead of explicitly raising in the function we call an error raising function like so `raise_error(f"Invalid bucket path. Does not contain date: {bucket_path}")` – jamiet Jul 21 '21 at 15:29
  • "If you'd left that in I could have given you a better answer!" ha ha, apologies :) – jamiet Jul 21 '21 at 15:30
  • IMO you should change that `raise_error` API to `return` the error and do `raise error("...")` so that mypy knows that when you hit that branch the function stops. You can't type a function as "always raises", but you *can* type it as "returns an Exception" and then use it in a `raise` expression. – Samwise Jul 21 '21 at 15:31
  • update: there is now a `NoReturn` type so that you can in fact type a function as always raising. :) – Samwise Dec 16 '22 at 19:57
-2

Another possibility would be to use isinstance.

def extract_table_date(bucket_path: str) -> str:
    event_date = re.search(r"date=([^/]+)", bucket_path)
    if isinstance(event_date, str):
        return event_date.group(1)[0:10].replace("-", "")
    return ""

Here you would return "" instead of None.

Chiel
  • 1,865
  • 1
  • 11
  • 24