0

how can I go from a dataframe with the following structure:

col1 col2 col3
TRUE FALSE TRUE
TRUE FALSE FALSE
TRUE TRUE TRUE
TRUE TRUE FALSE
TRUE FALSE TRUE

to a result like this, without using pandas only pyspark.

x TRUE FALSE
col1 5 0
col2 2 3
col3 3 2

NOTE THAT THE TRUE/FALSE COLUMN IS A COUNT OF THE NUMBER OF TRUE/FALSE THAT EXIST IN EACH COLUMN

Thanks!!

  • you can try `df.selectExpr(stack_multiple_col(df)).groupby("col").pivot("values").agg(F.count("values"))` after defining this function: **[stack_multiple_col](https://stackoverflow.com/a/65047489/9840637)** – anky Jul 30 '21 at 13:50
  • 1
    how about [pivoting](http://spark.apache.org/docs/3.0.1/api/python/pyspark.sql.html#pyspark.sql.GroupedData.pivot)? – pltc Jul 31 '21 at 05:19

0 Answers0