I have a dataframe which consists of 3 rows and more than 20 columns(dates)
+----+-----+-----+
|Cat |01/02|02/02|......
+----+-----+-----+
| a | 20 | 7 |......
| b | 30 | 12 |......
+----+---+-------+
and I want to get the sum from each column and add it as an extra row to the dataframe. In other words I expect to look like this:
+----+-----+-----+
|Cat |01/02|02/02|......
+----+-----+-----+
| a | 20 | 7 |......
| b | 30 | 12 |......
| All| 50 | 19 |......
+----+---+-------+
I am coding in pySpark and script is the following one:
from pyspark.sql import functions as F
for col_name in fs.columns:
print(col_name)
sf = df.unionAll(
df.select([
F.lit('Total').alias('Cat'),
F.sum(fs.col_name).alias("{}").format(col_name)
])
)
Unfortunately I am getting the error AttributeError: 'DataFrame' object has no attribute 'col_name'
. Any ideas what I am doing wrong? Thank you in advance!