I have a table using the crosstab function on pyspark, something like this:
df = sqlContext.createDataFrame( [(1,2,"a"),(3,2,"a"),(1,3,"b"),(2,2,"a"),(2,3,"b")],
["time", "value", "class"] )
tabla = df.crosstab("value","class")
tabla.withColumn("Total",tabla.a + tabla.b).show()
+-----------+---+---+-----+
|value_class| a| b|Total|
+-----------+---+---+-----+
| 2| 4| 0| 4|
| 4| 1| 2| 3|
| 3| 1| 4| 5|
+-----------+---+---+-----+
I need to aggregate a new column which indicates the cumulative sum from "total"