I created the following dataframe:
import pandas as pd
import databricks.koalas as ks
df = ks.DataFrame(
{'Date1': pd.date_range('20211101', '20211110', freq='1D'),
'Date2': pd.date_range('20201101', '20201110', freq='1D')})
df
Out[0]:
Date1 | Date2 | |
---|---|---|
0 | 2021-11-01 | 2020-11-01 |
1 | 2021-11-02 | 2020-11-02 |
2 | 2021-11-03 | 2020-11-03 |
3 | 2021-11-04 | 2020-11-04 |
4 | 2021-11-05 | 2020-11-05 |
5 | 2021-11-06 | 2020-11-06 |
6 | 2021-11-07 | 2020-11-07 |
7 | 2021-11-08 | 2020-11-08 |
8 | 2021-11-09 | 2020-11-09 |
9 | 2021-11-10 | 2020-11-10 |
When trying to get the minimum of Date1
I get the correct result:
df.Date1.min()
Out[1]:
Timestamp('2021-11-01 00:00:00')
Also, when trying to get the minimum values of each row the correct result is returned:
df.min(axis=1)
Out[2]:
0 2020-11-01
1 2020-11-02
2 2020-11-03
3 2020-11-04
4 2020-11-05
5 2020-11-06
6 2020-11-07
7 2020-11-08
8 2020-11-09
9 2020-11-10
dtype: datetime64[ns]
However, using the same functions on columns fails:
df.min(axis=0)
Out[3]:
Series([], dtype: float64)
Does anyone know why this is and if there's an elegant way around it?