I have a DataFrame, a snippet here:
[['u1', 1], ['u2', 0]]
basically a string field named f
and either a 1 or a 0 for second element (is_fav
).
What I need to do is grouping on the first field and counting the occurrences of 1s and 0s. I was hoping to do something like
num_fav = count((col("is_fav") == 1)).alias("num_fav")
num_nonfav = count((col("is_fav") == 0)).alias("num_nonfav")
df.groupBy("f").agg(num_fav, num_nonfav)
It does not work properly, I get in both cases the same result which amounts to the count for the items in the group, so the filter (whether it is a 1 or a 0) seems to be ignored. Does this depend on how count
works?