I have a rolling window computation that works in pandas
but not in koalas
, and I am wondering why:
import pandas as pd
import databricks.koalas as ks
Timestamp = pd.Timestamp
df = pd.DataFrame([[Timestamp('2022-05-18 18:10:50.021831300'), '65.78.97'],
[Timestamp('2022-05-24 09:48:39.787426700'), '65.78.97'],
[Timestamp('2022-05-24 16:06:18.765405500'), '65.78.97'],
[Timestamp('2022-05-25 03:04:01.860841300'), '65.78.97'],
[Timestamp('2022-05-26 05:01:08.335874700'), '47.41.196'],
[Timestamp('2022-05-31 03:57:15.060167500'), '47.41.196'],
[Timestamp('2022-05-31 06:32:37.177199300'), '47.41.196']],
columns=['time', 'ip'])
print(pd.DataFrame(df).groupby('ip').apply(lambda x: pd.Series([1]*len(x), index=x['time']).rolling(window='24h', min_periods=1).sum()))
It gives the count of sessions from a particular IP (removed the last octet) within the last 24 hours as expected:
ip time
47.41.196 2022-05-26 05:01:08.335874700 1.0
2022-05-31 03:57:15.060167500 1.0
2022-05-31 06:32:37.177199300 2.0
65.78.97 2022-05-18 18:10:50.021831300 1.0
2022-05-24 09:48:39.787426700 1.0
2022-05-24 16:06:18.765405500 2.0
2022-05-25 03:04:01.860841300 3.0
However, if I use koalas
for the same computation, I get an error:
print(ks.DataFrame(df).groupby('ip').apply(lambda x: ks.Series([1]*len(x), index=x['time']).rolling(window='1d', min_periods=1).sum()))
TypeError: '<' not supported between instances of 'str' and 'int'
Any help or advice will be greatly appreciated!