I have data like below
year name percent sex
1880 John 0.081541 boy
1881 William 0.080511 boy
1881 John 0.050057 boy
I need to groupby and count using different columns
df_year = df.groupby('year').count()
df_name = df.groupby('name').count()
df_sex = df.groupby('sex').count()
then I have to create a Window to get the top-3 data by each column
window = Window.partitionBy('year').orderBy(col("count").desc())
top4_res = df_year.withColumn('topn', func.row_number().over(window)).\
filter(col('topn') <= 4).repartition(1)
suppose I have hundreds of columns to groupby and count and topk_3 operation.
can I do it all in once?
or is there any better ways to do it?