Say I want to run this code with type hints:
def foo(df):
"""A very simple function which only add 3 days to one
of the dataframe's datetime columns.
"""
df['time'] = df['col1'] + pd.Timedelta('3D')
return df
# Creating a dummy dataframe
n_cols = 3
df = pd.concat([pd.Series(pd.date_range('20200101', '20200105')) for i in
range(n_cols)], keys=[f'col{i}' for i in range(n_cols)], axis=1)
df['group'] = [0, 0, 0, 1, 1]
df['name'] = ['s', 'dfgdfgg', 'd', 'd', 's']
# Using koalas groupby.apply mechanism without type hinting
res = ks.DataFrame(df).groupby('group').apply(foo)
The original dtypes:
>>> ks.DataFrame(df).dtypes
col0 datetime64[ns]
col1 datetime64[ns]
col2 datetime64[ns]
group int64
name object
If I run as is, the dtypes remain the same after the groupby.apply process
>>> res.dtypes
col0 datetime64[ns]
col1 datetime64[ns]
col2 datetime64[ns]
group int64
name object
time datetime64[ns]
The best working version I have with type hinting is currently this:
def foo(df) -> pd.DataFrame['col1': np.datetime64, 'col2': np.datetime64, 'col3':
np.datetime64, 'group': int, 'name': str]:
df['time'] = df['col1'] + pd.Timedelta('3D')
return df
res = ks.DataFrame(df).groupby('group').apply(foo)
But the returned dtypes are a bit different.
>>> res.dtypes
col1 datetime64
col2 datetime64
col3 datetime64
group int64
name <U0
Is there a way to get the exact "datetime64[ns]" and "object" dtypes?