31

My function returns a pandas series, where all elements have a specific type (say str). The following MWE should give an impression:

import pandas as pd 
def f() -> pd.Series:
    return pd.Series(['a', 'b']) 

Within the type hints I want to make clear, that f()[0] will always be of type str (compared for example to a function that would returnpd.Series([0, 1])). I did this:

def f() -> pd.Series[str]:

But

TypeError: 'type' object is not subscriptable

So, how to specify the type of pandas series elements in type hints?. Any ideas?

Qaswed
  • 3,649
  • 7
  • 27
  • 47
  • `pd.Series(dtype=str)` allows you to specify the data type of a series' elements. My guess is that this also works for type hints. – Swier Sep 09 '19 at 13:26
  • 6
    `pd.Series(dtype=str)` does not work for type hints. – Itamar Mushkin Sep 09 '19 at 13:32
  • Is there an "str" type in pandas ? Not sure, according to https://pbpython.com/pandas_dtypes.html (but maybe deprecated ?) – Doe Jowns Sep 09 '19 at 13:33
  • @ItamarMushkin: just out of couriosity, why do you think `pd.Series(dtype=str)` does not work for type hints? My 3.7 interpretor at least accepts it syntactically. – jottbe Sep 09 '19 at 15:44
  • 4
    @jottbe -- it's not a valid PEP 484 type. So while there's nothing stopping you from writing such a type hint, it would end up causing any tooling designed to analyze PEP 484 type hints to choke. (Static type checkers, linters, autocompletion tools...). Losing access to those tools would greatly diminish the usefulness of type hints to the point where you're probably better off not using them at all. – Michael0x2a Sep 09 '19 at 20:06
  • @Michael0x2a: ok I see. Thank you for the explanation. – jottbe Sep 09 '19 at 22:54
  • Also, it didn't run for me on 3.6.1 (Jupyter notebook if that matters) – Itamar Mushkin Sep 11 '19 at 04:28

7 Answers7

10

For python 3.8 try:

def f() -> "pd.Series[str]":
    pass

or:

f_return_type = "pd.Series[str]"
def f() -> f_return_type:
    pass

or # type: pd.Series[str] for variables

MrFogszi
  • 101
  • 1
  • 3
  • This works, but where can I find this type of annotations (inside stringns) in the python docs? I didn't find them in pep. and in the docs of typing there is only two example with no info about them: https://docs.python.org/3/library/typing.html – Ziur Olpa Nov 25 '22 at 14:17
8

you can use pandera for type-hinting and validating dataframes and series: https://pandera.readthedocs.io/en/stable/schema_models.html#schema-models

so in this case:

from pandera.typing import Series
import pandas as pd 

def f() -> Series[str]:
    return pd.Series(['a', 'b']) 
Peter Prescott
  • 768
  • 8
  • 16
1

I want to make clear that f()[0] will always be of type str (compared to a function that would return pd.Series([0, 1]))

This may be a great use-case to annotate the type based on how the value will be used, instead of what it is. ("has method" vs "is type" annotation).

In this case, the slice behavior is covered by the Sequence type.

from typing import Sequence
import pandas as pd

def returns_str_sequence() -> Sequence[str]:
    return pd.Series(['a', 'b'])

def uses_str_sequence(data: Sequence[str]):
    for _ in data:
        pass  # Iterable behavior also covered
    return data[0]  # slice works via __getitem__

For a fuller list of possible types you can use, feel free to review this document for the collections.abc module.

This may have a side benefit of de-coupling your code from 3rd party code/types as well, as your functions will be defined to handle more abstract types.

DataWizard
  • 61
  • 5
0

Unfortunately Python's type hinting does not support this out of the shelf. Nonetheless, you can always make use of dataenforce library (link) to add hints or even enforce validation.

ibarrond
  • 6,617
  • 4
  • 26
  • 45
  • Can you provide how this actually would be done for ´pd.Series´? Is this done using `DatasetMeta` (https://github.com/CedricFR/dataenforce/blob/master/dataenforce/__init__.py)? If yes, how? – Qaswed Jan 08 '20 at 15:36
0

You can utilize typing.TypeVar to accomplish this:

from typing import (
    TypeVar
)

SeriesString = TypeVar('pandas.core.series.Series(str)')
def f() -> SeriesString:
phoenix
  • 7,988
  • 6
  • 39
  • 45
zfact0rial
  • 25
  • 2
0

Your example should work in Python >= 3.9.

To get it to work in Python 3.8 (and possibly earlier 3.x's)

from __future__ import annotations

The PEPs around typing continue to evolve, as does Python...

dsz
  • 4,542
  • 39
  • 35
-4

You can specify using dtype parameter

import pandas as pd
data = pd.Series(['a', 'b'], dtype='str') 

for more information click here

Mohan
  • 46
  • 3
  • 3
    HI Mohan, thank you for your answer. Unfortunately, the solution does not work, since `data` is an unresolved reference, or am I missing something? – Qaswed Jan 20 '21 at 13:01
  • Hi @Qaswed the Data is the data you want add it to the series.. for your problem below code will work. The data type will show as object. str data type is also object type `import pandas as pd pd.Series(['a', 'b'], dtype='str')` – Mohan Jan 25 '21 at 08:21