I have a pyspark dataframe with a column of numbers. I need to sum that column and then have the result return as an int in a python variable.
df = spark.createDataFrame([("A", 20), ("B", 30), ("D", 80)],["Letter", "Number"])
I do the following to sum the column.
df.groupBy().sum()
But I get a dataframe back.
+-----------+
|sum(Number)|
+-----------+
| 130|
+-----------+
I would 130 returned as an int stored in a variable to be used else where in the program.
result = 130