I've been looking for robust type hints for a pandas DataFrame, but cannot seem to find anything useful. This question barely scratches the surface Pythonic type hints with pandas?
Normally if I want to hint the type of a function, that has a DataFrame as an input argument I would do:
import pandas as pd
def func(arg: pd.DataFrame) -> int:
return 1
What I cannot seem to find is how do I type hint a DataFrame with mixed dtypes. The DataFrame constructor supports only type definition of the complete DataFrame. So to my knowledge changes in the dtypes can only occur afterwards with the pd.DataFrame().astype(dtypes={})
function.
This here works, but doesn't seem very pythonic to me
import datetime
def func(arg: pd.DataFrame(columns=['integer', 'date']).astype(dtype={'integer': int, 'date': datetime.date})) -> int:
return 1
I came across this package: https://pypi.org/project/dataenforce/ with examples such as this one:
def process_data(data: Dataset["id": int, "name": object, "latitude": float, "longitude": float])
pass
This looks somewhat promising, but sadly the project is old and buggy.
As a data scientist, building a machine learning application with long ETL processes I believe that type hints are important.
What do you use and does anybody type hint their dataframes in pandas?