min() function doesn't work on koalas.DataFrame columns of date types

Question

I created the following dataframe:

import pandas as pd
import databricks.koalas as ks
df = ks.DataFrame(
    {'Date1': pd.date_range('20211101', '20211110', freq='1D'), 
     'Date2': pd.date_range('20201101', '20201110', freq='1D')})
df

Out[0]:

	Date1	Date2
0	2021-11-01	2020-11-01
1	2021-11-02	2020-11-02
2	2021-11-03	2020-11-03
3	2021-11-04	2020-11-04
4	2021-11-05	2020-11-05
5	2021-11-06	2020-11-06
6	2021-11-07	2020-11-07
7	2021-11-08	2020-11-08
8	2021-11-09	2020-11-09
9	2021-11-10	2020-11-10

When trying to get the minimum of Date1 I get the correct result:

df.Date1.min()

Out[1]:

Timestamp('2021-11-01 00:00:00')

Also, when trying to get the minimum values of each row the correct result is returned:

df.min(axis=1)

Out[2]:

0   2020-11-01
1   2020-11-02
2   2020-11-03
3   2020-11-04
4   2020-11-05
5   2020-11-06
6   2020-11-07
7   2020-11-08
8   2020-11-09
9   2020-11-10
dtype: datetime64[ns]

However, using the same functions on columns fails:

df.min(axis=0)

Out[3]:

Series([], dtype: float64)

Does anyone know why this is and if there's an elegant way around it?

score 2 · Answer 1 · answered Nov 29 '21 at 19:34

2

Try this:

df.apply(min, axis=0)

Out[1]:

Date1   2021-11-01
Date2   2020-11-01
dtype: datetime64[ns]

answered Nov 29 '21 at 19:34

Tal Reisfeld

31
2

Eran · Accepted Answer · 2022-01-25T16:32:14.567

0

This was indeed a bug in the code, but since then Koalas was merged with pyspark and the pandas on spark API was born. More information here.

Using spark 3.2.0 and above, one needs to replace

import databricks.koalas as ks

With

import pyspark.pandas as ps

and replace ks.DataFrame with ps.DataFrame. This completely eliminates the issue.

edited Jan 25 '22 at 16:32

answered Jan 25 '22 at 16:19

Eran

844
6
20

min() function doesn't work on koalas.DataFrame columns of date types

2 Answers2