Both "solutions" you referenced are wrong
I'll start with the second one:
from typing import TypeVar
import numpy as np, pandas as pd
SeriesFloat64 = TypeVar('pd.Series[np.float64]')
def fo() -> SeriesFloat64:
return pd.Series(np.float64(0))
Is this type variable technically a valid annotation? Yes. Does it specify the generic pd.Series
? No.
First of all, as @jonrsharpe pointed out, this breaks the convention of initializing your type variables with a name
argument that corresponds to the actual name of the variable. More importantly, neither bound
nor constraints
have been specified, meaning you might as well have just written this:
from typing import TypeVar
import numpy as np, pandas as pd
T = TypeVar("T") # which is the same as `TypeVar("T", bound=typing.Any)`
def fo() -> T:
return pd.Series(np.float64(0))
This would at least fix the name issue, but it would not specify anything about the return type of fo()
. In fact, mypy
will correctly point out the following:
error: Incompatible return value type (got "Series[Any]", expected "T") [return-value]
Which already gives a hint about what we can and cannot do regarding specification of pd.Series
and this leads us to the second "solution":
import numpy as np, pandas as pd
def fo() -> "pd.Series[np.float64]":
return pd.Series(np.float64(0))
This is equivalent, by the way: (no quotes needed)
from __future__ import annotations
import numpy as np, pandas as pd
def fo() -> pd.Series[np.float64]:
return pd.Series(np.float64(0))
This is wrong because the type parameter for the generic Series
does not accept np.float64
. Again, mypy
points this out:
error: Value of type variable "S1" of "Series" cannot be "floating" [type-var]
If we check out the pandas-stubs
source for core.series.Series
(as of today), we see that Series
inherits from typing.Generic[S1]
. And when we go to the definition of _typing.S1
, we can see the constraints on that type variable. And a numpy float is not among them, but we do find the built-in float
. What does that mean?
Well, we know that np.float64
does inherit from the regular float
, but it also inherits from np.floating
, and that is the issue. As mentioned in this section of PEP 484, as opposed to an upper bound
type constraints cause the inferred type to be exactly one of the constraint types[.]
This means that you are not allowed to use np.float64
in place of the aforementioned S1
to specify a Series
type.
A better way
The "most correct" way to type hint that function, in my opinion, is to do this:
from __future__ import annotations
import numpy as np, pandas as pd
def fo() -> pd.Series[float]:
return pd.Series(np.float64(0))
This makes correct use of the Series
generic providing a type variable that is both in line with the defined type constraints and indicates an element type for the series returned by the function that as close as you can get to the actual type since np.float64
does inherit from float
.
It also passed the strict mypy
check.
Adding useful information
One caveat of this is that you lose the information that the series actually contains 64-bit floats. If you want that because you want the signature/documentation for the function to reflect that nuance, you can simply set up a custom type alias:
from __future__ import annotations
import numpy as np, pandas as pd
float64 = float
def fo() -> pd.Series[float64]:
return pd.Series(np.float64(0))
Now calling help(fo)
gives this:
...
fo() -> 'pd.Series[float64]'
But it is important to note that this is just for your benefit and does absolutely nothing for the static type checker.
Limitations of pd.Series
types
Another thing worth mentioning is that as of today, there are no useful annotations on many of the methods of a pd.Series
that return a single element, such as the __getitem__
method for access via the square brackets []
. Say I do this:
...
series = fo()
x = series[0]
print(x, type(x))
y = int(x)
The output is 0.0 <class 'numpy.float64'>
, but type checkers have no clue that x
is a np.float64
or any float
at all for that matter. (In fact, my PyCharm complains at y = int(x)
because it thinks x
is a timestamp for whatever reason.)
This is just to illustrate that as of now, you may not get any useful auto-suggestions when dealing with pd.Series
, even if you annotate your types more or less correctly.
Hope this helps.