I want to split my pyspark dataframe in groups with monotonically increasing trend and keep the groups with size greater than 10.
here i tried some part of code,
from pyspark.sql import functions as F, Window
df = df1.withColumn(
"FLAG_INCREASE",
F.when(
F.col("x")
> F.lag("x").over(Window.partitionBy("x1").orderBy("time")),
1,
).otherwise(0),
)
i don't know how to do groupby by consective one's in pyspark... if anyone have better solution for this
same thing in pandas we can do like this :
df=df1.groupby((df1['x'].diff() < 0).cumsum())
how to convert this code to pyspark ?
example dataframe:
x
0 1
1 2
2 2
3 2
4 3
5 3
6 4
7 5
8 4
9 3
10 2
11 1
12 2
13 3
14 4
15 5
16 5
17 6
expected output
group1:
x
0 1
1 2
2 2
3 2
4 3
5 3
6 4
7 5
group2:
x
0 1
1 2
2 3
3 4
4 5
5 5
6 6