dtype={"ColA": str}
----------------------------------------------
use_koalas: True
df:
ColA ColB ColC
0 A 0 0.00
1 None 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df['ColA'][1]): <class 'NoneType'>
df[df.notna()]:
ColA ColB ColC
0 A 0 0.00
1 None 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df['ColA'][1]): <class 'NoneType'>
df = df[df.notna()].astype(dtype)
df:
ColA ColB ColC
0 A 0 0.00
1 None 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df['ColA'][1]): <class 'NoneType'>
----------------------------------------------
use_koalas: False
df:
ColA ColB ColC
0 A 0 0.00
1 None 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df['ColA'][1]): <class 'NoneType'>
df[df.notna()]:
ColA ColB ColC
0 A 0 0.00
1 NaN 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df[df.notna()]['ColA'][1]): <class 'float'>
df = df[df.notna()].astype(dtype)
df:
ColA ColB ColC
0 A 0 0.00
1 nan 1 12.30
2 C 2 22.20
3 D 1 3.14
type(df['ColA'][1]): <class 'str'>
----------------------------------------------
I've messed around with using "string" for my dtype instead of str but there are some downstream effects. This is on a very large dataset so ideally I would not be using the mask function. So why are the pandas and koalas dataframes/functions behaving differently?