from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.getOrCreate()
data = [('James','Smith','M',3000), ('Anna','Rose','F',4100),
('Robert','Williams','M',6200)
]
columns = ["firstname","lastname","gender","salary"]
df = spark.createDataFrame(data=data, schema = columns)
df2 = df.select(lit("D").alias("S"), "*")
df2.show()
Output:
----------
+---+---------+--------+------+------+
| S|firstname|lastname|gender|salary|
+---+---------+--------+------+------+
| D| James| Smith| M| 3000|
| D| Anna| Rose| F| 4100|
| D| Robert|Williams| M| 6200|
+---+---------+--------+------+------+
Required Output:
- Need to add an extra row "T" and count of row for column- "firstname" like below. Column "firstname" can be of any type .
+---+---------+--------+------+------+
| S|firstname|lastname|gender|salary|
+---+---------+--------+------+------+
| D| James| Smith| M| 3000|
| D| Anna| Rose| F| 4100|
| D| Robert|Williams| M| 6200|
| T| 3 | | | |
+---+---------+--------+------+------+
Tried creating a new data frame with trailer values and apply union as suggested on most of the stacoverflow solution- but both the dataframe should have same no of columns. Is there any better way to have the count in the trailer irrespective of column type of "firstname" column.