5

I'm using pydantic to validate a Json/Dict input. But I'm also using mypy to validate the type integrity of the code.

When using the pydantic.constr type, which, among other things, validates if a given string respects a regex, I get a mypy error.

Here is the code:

from typing import List

import pydantic

Regex = pydantic.constr(regex="[0-9a-z_]*")


class Data(pydantic.BaseModel):
    regex: List[Regex]


data = Data(**{"regex":["abc", "123", "etc"]})
print(data, data.json())

And here is the mypy output:

$ mypy main.py 
main.py:9: error: Variable "main.Regex" is not valid as a type
main.py:9: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases

I checked the documentation, but could not find a way to handle this. I know I could create a static type for that regex, but that kind of defeats the purpose of pydantic. The only way I could make this pass was with a # type: ignore which is far from ideal.

So is there a way of handling this in a way to have both the pydantic and mypy benefits?

mihi
  • 3,097
  • 16
  • 26
Cristiano Araujo
  • 1,632
  • 2
  • 21
  • 32

1 Answers1

5

There are a few ways to achieve this:

Inheriting from pydantic.ConstrainedStr

Instead of using constr to specify the regex constraint (which uses pydantic.ConstrainedStr internally), you can inherit from pydantic.ConstrainedStr directly:

import re
import pydantic
from pydantic import Field
from typing import List

class Regex(pydantic.ConstrainedStr):
    regex = re.compile("^[0-9a-z_]*$")

class Data(pydantic.BaseModel):
    regex: List[Regex]

data = Data(**{"regex": ["abc", "123", "asdf"]})
print(data)
# regex=['abc', '123', 'asdf']
print(data.json())
# {"regex": ["abc", "123", "asdf"]}

Mypy accepts this happily and pydantic does correct validation. The type of data.regex[i] is Regex, but as pydantic.ConstrainedStr itself inherits from str, it can be used as a string in most places.

Using pydantic.Field

The regex consraint can also be specified as an argument to Field:

import pydantic
from pydantic import Field
from typing import List

class Regex(pydantic.BaseModel):
    __root__: str = Field(regex="^[0-9a-z_]*$")

class Data(pydantic.BaseModel):
    regex: List[Regex]

data = Data(**{"regex": ["abc", "123", "asdf"]})
print(data)
# regex=[Regex(__root__='abc'), Regex(__root__='123'), Regex(__root__='asdf')]
print(data.json())
# {"regex": ["abc", "123", "asdf"]}

Because Regex is not directly used as a field in a pydantic model (but as an entry in a list in your example), we need to introduce a model by force. __root__ makes the Regex model act as its single field when validating and serializing (more details here).

But it has a drawback: the type of data.regex[i] is again Regex, but this time not inheriting from str. This results in e.g. foo: str = data.regex[0] not typechecking. foo: str = data.regex[0].__root__ has to be used instead.

I'm still mentioning this here because it might be the simplest solution when the constraint is applied directly to a field and not to a list entry (and typing.Annotated is not avaible, see below). For example like so:

class DataNotList(pydantic.BaseModel):
    regex: str = Field(regex="^[0-9a-z_]*$")

Using typing.Annotated with pydantic.Field

Instead of using constr to specify the regex constraint, you can specify it as an argument to Field and then use it in combination with typing.Annotated:

import pydantic
from pydantic import Field
from typing import Annotated

Regex = Annotated[str, Field(regex="^[0-9a-z_]*$")]

class DataNotList(pydantic.BaseModel):
    regex: Regex

data = DataNotList(**{"regex": "abc"})
print(data)
# regex='abc'
print(data.json())
# {"regex": "abc"}

Mypy treats Annotated[str, Field(regex="^[0-9a-z_]*$")] as a type alias of str. But it also tells pydantic to do validation. This is described in the pydantic docs here.

Unfortunately it does not curretly work with the following:

class Data(pydantic.BaseModel):
    regex: List[Regex]

The validation simply does not get run. This is an open bug (github issue). Once the bug is fixed this might overall be the best solution.

Note that typing.Annotated is only available since Python 3.9. For older Python versions typing_extensions.Annotated can be used.


As a side note: I've used ^[0-9a-z_]*$ instead of [0-9a-z_]* for the regex, as the latter would accept any string as valid, as pydantic uses re.match for validation.

mihi
  • 3,097
  • 16
  • 26