How to specify variable type of a pandas Series (string or TypeVar)?

Question

I want to use type hinting for something like:

def fo() -> pd.Series[np.float64]:
   return pd.Series(np.float64[0])

This won't work.

From this answer: How to specify the type of pandas series elements in type hints?

I understand I can use either:

def fo() -> "pd.Series[np.float64]":
   return pd.Series(np.float64[0])

Or:

from typing import (
    TypeVar
)

SeriesFloat64 = TypeVar('pd.Series[np.float64]')
def fo() -> SeriesFloat64:
   return pd.Series(np.float64[0])

Why should I prefer one over the other?

I think that would be a misuse of a `TypeVar` and not give the checker any information - the first argument is just a _name_: https://docs.python.org/3/library/typing.html#generics, https://docs.python.org/3/library/typing.html#typing.TypeVar. — jonrsharpe, Sep 21 '22 at 11:35
I assume you meant to return a `pd.Series(np.float64(0))`, not with the square brackets around the `0`, to instantiate a float with the value zero. — Daniil Fajnberg, Sep 22 '22 at 07:36

score 1 · Accepted Answer · answered Sep 22 '22 at 09:01

Both "solutions" you referenced are wrong

I'll start with the second one:

from typing import TypeVar
import numpy as np, pandas as pd

SeriesFloat64 = TypeVar('pd.Series[np.float64]')

def fo() -> SeriesFloat64:
    return pd.Series(np.float64(0))

Is this type variable technically a valid annotation? Yes. Does it specify the generic pd.Series? No.

First of all, as @jonrsharpe pointed out, this breaks the convention of initializing your type variables with a name argument that corresponds to the actual name of the variable. More importantly, neither bound nor constraints have been specified, meaning you might as well have just written this:

from typing import TypeVar
import numpy as np, pandas as pd

T = TypeVar("T")  # which is the same as `TypeVar("T", bound=typing.Any)`

def fo() -> T:
    return pd.Series(np.float64(0))

This would at least fix the name issue, but it would not specify anything about the return type of fo(). In fact, mypy will correctly point out the following:

error: Incompatible return value type (got "Series[Any]", expected "T")  [return-value]

Which already gives a hint about what we can and cannot do regarding specification of pd.Series and this leads us to the second "solution":

import numpy as np, pandas as pd

def fo() -> "pd.Series[np.float64]":
    return pd.Series(np.float64(0))

This is equivalent, by the way: (no quotes needed)

from __future__ import annotations
import numpy as np, pandas as pd

def fo() -> pd.Series[np.float64]:
    return pd.Series(np.float64(0))

This is wrong because the type parameter for the generic Series does not accept np.float64. Again, mypy points this out:

error: Value of type variable "S1" of "Series" cannot be "floating"  [type-var]

If we check out the pandas-stubs source for core.series.Series (as of today), we see that Series inherits from typing.Generic[S1]. And when we go to the definition of _typing.S1, we can see the constraints on that type variable. And a numpy float is not among them, but we do find the built-in float. What does that mean?

Well, we know that np.float64 does inherit from the regular float, but it also inherits from np.floating, and that is the issue. As mentioned in this section of PEP 484, as opposed to an upper bound

type constraints cause the inferred type to be exactly one of the constraint types[.]

This means that you are not allowed to use np.float64 in place of the aforementioned S1 to specify a Series type.

A better way

The "most correct" way to type hint that function, in my opinion, is to do this:

from __future__ import annotations
import numpy as np, pandas as pd

def fo() -> pd.Series[float]:
    return pd.Series(np.float64(0))

This makes correct use of the Series generic providing a type variable that is both in line with the defined type constraints and indicates an element type for the series returned by the function that as close as you can get to the actual type since np.float64 does inherit from float.

It also passed the strict mypy check.

Adding useful information

One caveat of this is that you lose the information that the series actually contains 64-bit floats. If you want that because you want the signature/documentation for the function to reflect that nuance, you can simply set up a custom type alias:

from __future__ import annotations
import numpy as np, pandas as pd

float64 = float

def fo() -> pd.Series[float64]:
    return pd.Series(np.float64(0))

Now calling help(fo) gives this:

...
fo() -> 'pd.Series[float64]'

But it is important to note that this is just for your benefit and does absolutely nothing for the static type checker.

Limitations of `pd.Series` types

Another thing worth mentioning is that as of today, there are no useful annotations on many of the methods of a pd.Series that return a single element, such as the __getitem__ method for access via the square brackets []. Say I do this:

...
series = fo()
x = series[0]
print(x, type(x))
y = int(x)

The output is 0.0 <class 'numpy.float64'>, but type checkers have no clue that x is a np.float64 or any float at all for that matter. (In fact, my PyCharm complains at y = int(x) because it thinks x is a timestamp for whatever reason.)

This is just to illustrate that as of now, you may not get any useful auto-suggestions when dealing with pd.Series, even if you annotate your types more or less correctly.

Hope this helps.

But `pd.Series[str]` doesn't parse in Python 3.10, at least for a function argument type: `E TypeError: 'type' object is not subscriptable` — rjurney, Apr 10 '23 at 18:38
@rjurney it does *parse* (if it didn't, it would raise a `SyntaxError` not a `TypeError`). The pandas data types do not support generic type annotations (who knows if they ever will), that's why this answer explained that you need to use either *strings* or to use `from __future__ import annotations` — juanpa.arrivillaga, Apr 10 '23 at 22:02
@juanpa.arrivillaga you can just use "pd.Series(str)" and it works ok. Your answer is fantastic, but that bit didn't work for me without parenthesis. — rjurney, Apr 19 '23 at 21:49
@rjurney `pd.Series(str)` **is not a valid type annotation**. Anyway, this isn't my answer — juanpa.arrivillaga, Apr 19 '23 at 21:52
@rjurney I don't know what you mean by "it works". If you mean, "it doesn't throw an error", then that's true, but neither does, say, "banana" — juanpa.arrivillaga, Apr 19 '23 at 21:59
@rjurney Your answer below and your use of `TypeVar` indicate that you are very confused about the basics of type annotations and the typing constructs provided in Python. I would suggest you read through [PEP 483](https://peps.python.org/pep-0483) (theory) and [PEP 484](https://peps.python.org/pep-0484) (implementations). I think I explained well enough what limitations we face with the Pandas classes and how we can work around that and juanpa.arrivillaga even addressed what you misunderstood. If you have further questions, feel free to post them and link them in a comment here. — Daniil Fajnberg, Apr 20 '23 at 08:06

How to specify variable type of a pandas Series (string or TypeVar)?

1 Answers1

Both "solutions" you referenced are wrong

A better way

Adding useful information

Limitations of pd.Series types

Limitations of `pd.Series` types