0

Say I want to run this code with type hints:

def foo(df):
"""A very simple function which only add 3 days to one 
   of the dataframe's datetime columns.
"""
    df['time'] = df['col1'] + pd.Timedelta('3D')
    return df

# Creating a dummy dataframe
n_cols = 3
df = pd.concat([pd.Series(pd.date_range('20200101', '20200105')) for i in 
    range(n_cols)], keys=[f'col{i}' for i in range(n_cols)], axis=1)
df['group'] = [0, 0, 0, 1, 1]
df['name'] = ['s', 'dfgdfgg', 'd', 'd', 's']

# Using koalas groupby.apply mechanism without type hinting
res = ks.DataFrame(df).groupby('group').apply(foo)

The original dtypes:

>>> ks.DataFrame(df).dtypes

col0     datetime64[ns]
col1     datetime64[ns]
col2     datetime64[ns]
group             int64
name             object

If I run as is, the dtypes remain the same after the groupby.apply process

>>> res.dtypes

col0     datetime64[ns]
col1     datetime64[ns]
col2     datetime64[ns]
group             int64
name             object
time     datetime64[ns]

The best working version I have with type hinting is currently this:

def foo(df) -> pd.DataFrame['col1': np.datetime64, 'col2': np.datetime64, 'col3': 
    np.datetime64, 'group': int, 'name': str]:
    df['time'] = df['col1'] + pd.Timedelta('3D')
    return df

res = ks.DataFrame(df).groupby('group').apply(foo)

But the returned dtypes are a bit different.

>>> res.dtypes

col1      datetime64
col2      datetime64
col3      datetime64
group          int64
name             <U0

Is there a way to get the exact "datetime64[ns]" and "object" dtypes?

Eran
  • 844
  • 6
  • 20
  • `ns` is the default unit for `datetime64`, so are you sure there's a difference? likewise ` – Josh Friedlander Dec 21 '21 at 13:59
  • @JoshFriedlander I'm not entirely sure this matters. I'm just concerned this will cause compatibility issues. Btw, unfortunately in Koalas, the default is `s` and not `ns`, which makes the default `datetime64[ns]` dtype even more peculiar. – Eran Dec 22 '21 at 14:42

0 Answers0